Abstract
Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth.
| Original language | English |
|---|---|
| Title of host publication | Advances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Proceedings |
| Subtitle of host publication | 40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018, Proceedings |
| Editors | Gabriella Pasi, Benjamin Piwowarski, Leif Azzopardi, Allan Hanbury |
| Publisher | Springer |
| Pages | 638-643 |
| Number of pages | 6 |
| ISBN (Electronic) | 978-3-319-76941-7 |
| ISBN (Print) | 978-3-319-76940-0 |
| DOIs | |
| Publication status | Published - 1 Jan 2018 |
| Externally published | Yes |
| Event | 40th European Conference on Information Retrieval 2018 - Grenoble, France Duration: 26 Mar 2018 → 29 Mar 2018 Conference number: 40 http://www.ecir2018.org/ |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Publisher | Springer |
| Volume | 10772 |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 40th European Conference on Information Retrieval 2018 |
|---|---|
| Abbreviated title | ECIR 2018 |
| Country/Territory | France |
| City | Grenoble |
| Period | 26/03/18 → 29/03/18 |
| Internet address |
Keywords
- Collection-document summaries
- Text summarization