Abstract
Semantic annotations have to satisfy quality constraints to be useful for digital libraries, which is particularly challenging on large and diverse datasets. Confidence scores of multi-label classification methods typically refer only to the relevance of particular subjects, disregarding indicators of insufficient content representation at the documentlevel. Therefore, we propose a novel approach that detects documents rather than concepts where quality criteria are met. Our approach uses a deep, multi-layered regression architecture, which comprises a variety of content-based indicators. We evaluated multiple configurations using text collections from law and economics, where the available content is restricted to very short texts. Notably, we demonstrate that the proposed quality estimation technique can determine subsets of the previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Hence, the approach effectively performs a filtering that ensures high data quality standards in operative information retrieval systems.
Original language | English |
---|---|
Title of host publication | Digital Libraries for Open Knowledge - 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Proceedings |
Editors | Eva Mendez, Cristina Ribeiro, Gabriel David, João Correia Lopes, Fabio Crestani |
Pages | 3-15 |
Number of pages | 13 |
DOIs | |
Publication status | Published - 2018 |
Event | 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018 - University of Porto, Faculty of Engineering, Porto, Portugal Duration: 10 Sept 2018 → 13 Sept 2018 Conference number: 22 http://www.tpdl.eu/tpdl2018/ |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11057 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018 |
---|---|
Abbreviated title | TPDL |
Country/Territory | Portugal |
City | Porto |
Period | 10/09/18 → 13/09/18 |
Internet address |