Collection-document summaries

Nils Witt, Michael Granitzer, Christin Seifert

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

188 Downloads (Pure)

Abstract

Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth.
Original languageEnglish
Title of host publicationAdvances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Proceedings
Subtitle of host publication40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018, Proceedings
EditorsGabriella Pasi, Benjamin Piwowarski, Leif Azzopardi, Allan Hanbury
PublisherSpringer
Pages638-643
Number of pages6
ISBN (Electronic)978-3-319-76941-7
ISBN (Print)978-3-319-76940-0
DOIs
Publication statusPublished - 1 Jan 2018
Externally publishedYes
Event40th European Conference on Information Retrieval 2018 - Grenoble, France
Duration: 26 Mar 201829 Mar 2018
Conference number: 40
http://www.ecir2018.org/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume10772
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference40th European Conference on Information Retrieval 2018
Abbreviated titleECIR 2018
Country/TerritoryFrance
CityGrenoble
Period26/03/1829/03/18
Internet address

Keywords

  • Collection-document summaries
  • Text summarization

Fingerprint

Dive into the research topics of 'Collection-document summaries'. Together they form a unique fingerprint.

Cite this