Abstract
Documents indexed with controlled vocabularies enable users of libraries to discover relevant documents, even across language barriers. Due to the rapid growth of scientific publications, digital libraries require automatic methods that index documents accurately, especially with regard to explicit or implicit concept drift, that is, with respect to new descriptor terms and new types of documents, respectively. This paper first analyzes architectures of related approaches on automatic indexing. We show that their design determines individual strengths and weaknesses and justify research on their fusion. In particular, systems benefit from statistical associative components as well as from lexical components applying dictionary matching, ranking, and binary classification. The analysis emphasizes the importance of descriptor-invariant learning, that is, learning based on features which can be transferred between different descriptors. Theoretic and experimental results on economic titles and author keywords underline the relevance of the fusion methodology in terms of overall accuracy and adaptability to dynamic domains. Experiments show that fusion strategies combining a binary relevance approach and a thesaurus-based system outperform all other strategies on the tested data set. Our findings can help researchers and practitioners in digital libraries to choose appropriate methods for automatic indexing. © 2017 IEEE.
Original language | English |
---|---|
Title of host publication | 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL) |
Place of Publication | Piscataway, NJ |
Publisher | IEEE |
ISBN (Electronic) | 978-1-5386-3861-3 |
ISBN (Print) | 978-1-5386-3862-0 |
DOIs | |
Publication status | Published - 25 Jul 2017 |
Externally published | Yes |
Event | Joint Conference on Digital Libraries, JCDL 2017 - University of Toronto, Toronto, Canada Duration: 19 Jun 2017 → 23 Jun 2017 https://2017.jcdl.org/ |
Conference
Conference | Joint Conference on Digital Libraries, JCDL 2017 |
---|---|
Abbreviated title | JCDL 2017 |
Country/Territory | Canada |
City | Toronto |
Period | 19/06/17 → 23/06/17 |
Internet address |
Keywords
- Automatic subject indexing
- Keyphrase indexing
- Meta-learning
- Multi-label classification
- Short texts
- Zero-shot learning