Fusion architectures for automatic subject indexing under concept drift: Analysis and empirical results on short texts

Martin Toepfer, Christin Seifert

    Research output: Contribution to specialist publicationArticleProfessional

    6 Citations (Scopus)
    560 Downloads (Pure)

    Abstract

    Indexing documents with controlled vocabularies enables a wealth of semantic applications for digital libraries. Due to the rapid growth of scientific publications, machine learning-based methods are required that assign subject descriptors automatically. While stability of generative processes behind the underlying data is often assumed tacitly, it is being violated in practice. Addressing this problem, this article studies explicit and implicit concept drift, that is, settings with new descriptor terms and new types of documents, respectively. First, the existence of concept drift in automatic subject indexing is discussed in detail and demonstrated by example. Subsequently, architectures for automatic indexing are analyzed in this regard, highlighting individual strengths and weaknesses. The results of the theoretical analysis justify research on fusion of different indexing approaches with special consideration on information sharing among descriptors. Experimental results on titles and author keywords in the domain of economics underline the relevance of the fusion methodology, especially under concept drift. Fusion approaches outperformed non-fusion strategies on the tested data sets, which comprised shifts in priors of descriptors as well as covariates. These findings can help researchers and practitioners in digital libraries to choose appropriate methods for automatic subject indexing, as is finally shown by a recent case study.
    Original languageEnglish
    Pages169–189
    Number of pages21
    Volume21
    Specialist publicationInternational journal on digital libraries
    PublisherSpringer
    DOIs
    Publication statusPublished - Jun 2020

    Keywords

    • Automatic subject indexing
    • Concept drift
    • Meta-learning
    • Multi-label classification
    • Short texts

    Fingerprint

    Dive into the research topics of 'Fusion architectures for automatic subject indexing under concept drift: Analysis and empirical results on short texts'. Together they form a unique fingerprint.

    Cite this