A two-step approach toward subject prediction

Rob Koopman, Shenghui Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing could no longer cope with the rapid growth of digital collections. Data sparsity and model scalability are the major challenges to solving this extreme multi-label classification problem automatically. In this research-in-progress paper, we propose to address this problem using a two-step approach. We first propose to use an efficient and effective embedding method that embed terms, subjects and documents into the same semantic space, where similarity could be computed easily. We then describe a novel Non-Parametric Subject Prediction (NPSP) method and show how effectively it predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.

Original languageEnglish
Title of host publication17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings
Subtitle of host publication2-5 September 2019 Sapienza University of Rome, Italy
EditorsGiuseppe Catalano, Cinzia Daraio, Martina Gregori, Henk F. Moed, Giancarlo Ruocco
PublisherInternational Society for Scientometrics and Informetrics
Pages1038-1043
Number of pages6
VolumeI
ISBN (Electronic)978-88-3381-118-5
Publication statusPublished - 2019
Event17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Rome, Italy
Duration: 2 Sep 20195 Sep 2019
Conference number: 17

Conference

Conference17th International Conference on Scientometrics and Informetrics, ISSI 2019
Abbreviated titleISSI
CountryItaly
CityRome
Period2/09/195/09/19

Fingerprint Dive into the research topics of 'A two-step approach toward subject prediction'. Together they form a unique fingerprint.

Cite this