Non-Parametric Subject Prediction

Shenghui Wang*, Rob Koopman, Gwenn Englebienne

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Abstract

    Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. This is an “extreme multi-label classification” problem, where the objective is to assign a small subset of the most relevant subjects from an extremely large label set. Data sparsity and model scalability are the major challenges we need to address to solve it automatically. In this paper, we describe an efficient and effective embedding method that embeds terms, subjects and documents into the same semantic space, where similarity can be computed easily. We then propose a novel Non-Parametric Subject Prediction (NPSP) method and show how effectively it predicts even very specialised subjects, which are associated with few documents in the training set and are not predicted by state-of-the-art classifiers.

    Original languageEnglish
    Title of host publicationDigital Libraries for Open Knowledge
    Subtitle of host publication23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, September 9-12, 2019, Proceedings
    EditorsAntoine Doucet, Antoine Isaac, Koraljka Golub, Trond Aalberg, Adam Jatowt
    Place of PublicationCham
    PublisherSpringer
    Pages312-326
    Number of pages15
    ISBN (Electronic)978-3-030-30760-8
    ISBN (Print)978-3-030-30759-2
    DOIs
    Publication statusPublished - 1 Jan 2019
    Event23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019 - OsloMet, Oslo, Norway
    Duration: 9 Sep 201912 Sep 2019
    Conference number: 23
    http://www.tpdl.eu/tpdl2019/contributions/

    Publication series

    NameLecture Notes in Computer Science
    PublisherSpringer
    Volume11799
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019
    Abbreviated titleTPDL 2019
    CountryNorway
    CityOslo
    Period9/09/1912/09/19
    Internet address

    Keywords

    • Non-parametric method
    • Random projection
    • Semantic embedding
    • Subject prediction

    Fingerprint Dive into the research topics of 'Non-Parametric Subject Prediction'. Together they form a unique fingerprint.

  • Cite this

    Wang, S., Koopman, R., & Englebienne, G. (2019). Non-Parametric Subject Prediction. In A. Doucet, A. Isaac, K. Golub, T. Aalberg, & A. Jatowt (Eds.), Digital Libraries for Open Knowledge: 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, September 9-12, 2019, Proceedings (pp. 312-326). (Lecture Notes in Computer Science; Vol. 11799). Cham: Springer. https://doi.org/10.1007/978-3-030-30760-8_27