A two-step approach toward subject prediction

Rob Koopman, Shenghui Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review


Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing could no longer cope with the rapid growth of digital collections. Data sparsity and model scalability are the major challenges to solving this extreme multi-label classification problem automatically. In this research-in-progress paper, we propose to address this problem using a two-step approach. We first propose to use an efficient and effective embedding method that embed terms, subjects and documents into the same semantic space, where similarity could be computed easily. We then describe a novel Non-Parametric Subject Prediction (NPSP) method and show how effectively it predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.

Original languageEnglish
Title of host publication17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings
Subtitle of host publication2-5 September 2019 Sapienza University of Rome, Italy
EditorsGiuseppe Catalano, Cinzia Daraio, Martina Gregori, Henk F. Moed, Giancarlo Ruocco
PublisherInternational Society for Scientometrics and Informetrics
Number of pages6
ISBN (Electronic)978-88-3381-118-5
Publication statusPublished - 2019
Event17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Rome, Italy
Duration: 2 Sept 20195 Sept 2019
Conference number: 17


Conference17th International Conference on Scientometrics and Informetrics, ISSI 2019
Abbreviated titleISSI


Dive into the research topics of 'A two-step approach toward subject prediction'. Together they form a unique fingerprint.

Cite this