Abstract
Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. This is an “extreme multi-label classification” problem, where the objective is to assign a small subset of the most relevant subjects from an extremely large label set. Data sparsity and model scalability are the major challenges we need to address to solve it automatically. In this paper, we describe an efficient and effective embedding method that embeds terms, subjects and documents into the same semantic space, where similarity can be computed easily. We then propose a novel Non-Parametric Subject Prediction (NPSP) method and show how effectively it predicts even very specialised subjects, which are associated with few documents in the training set and are not predicted by state-of-the-art classifiers.
Original language | English |
---|---|
Title of host publication | Digital Libraries for Open Knowledge |
Subtitle of host publication | 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, September 9-12, 2019, Proceedings |
Editors | Antoine Doucet, Antoine Isaac, Koraljka Golub, Trond Aalberg, Adam Jatowt |
Place of Publication | Cham |
Publisher | Springer |
Pages | 312-326 |
Number of pages | 15 |
ISBN (Electronic) | 978-3-030-30760-8 |
ISBN (Print) | 978-3-030-30759-2 |
DOIs | |
Publication status | Published - 1 Jan 2019 |
Event | 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019 - OsloMet, Oslo, Norway Duration: 9 Sept 2019 → 12 Sept 2019 Conference number: 23 http://www.tpdl.eu/tpdl2019/contributions/ |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 11799 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019 |
---|---|
Abbreviated title | TPDL 2019 |
Country/Territory | Norway |
City | Oslo |
Period | 9/09/19 → 12/09/19 |
Internet address |
Keywords
- Non-parametric method
- Random projection
- Semantic embedding
- Subject prediction