Abstract
Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth.
Original language | English |
---|---|
Title of host publication | Advances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Proceedings |
Subtitle of host publication | 40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018, Proceedings |
Editors | Gabriella Pasi, Benjamin Piwowarski, Leif Azzopardi, Allan Hanbury |
Publisher | Springer |
Pages | 638-643 |
Number of pages | 6 |
ISBN (Electronic) | 978-3-319-76941-7 |
ISBN (Print) | 978-3-319-76940-0 |
DOIs | |
Publication status | Published - 1 Jan 2018 |
Externally published | Yes |
Event | 40th European Conference on Information Retrieval 2018 - Grenoble, France Duration: 26 Mar 2018 → 29 Mar 2018 Conference number: 40 http://www.ecir2018.org/ |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 10772 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 40th European Conference on Information Retrieval 2018 |
---|---|
Abbreviated title | ECIR 2018 |
Country/Territory | France |
City | Grenoble |
Period | 26/03/18 → 29/03/18 |
Internet address |
Keywords
- Collection-document summaries
- Text summarization