Collection-document summaries

Nils Witt, Michael Granitzer, Christin Seifert

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    72 Downloads (Pure)


    Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth.
    Original languageEnglish
    Title of host publicationAdvances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Proceedings
    Subtitle of host publication40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018, Proceedings
    EditorsGabriella Pasi, Benjamin Piwowarski, Leif Azzopardi, Allan Hanbury
    Number of pages6
    ISBN (Electronic)978-3-319-76941-7
    ISBN (Print)978-3-319-76940-0
    Publication statusPublished - 1 Jan 2018
    Event40th European Conference on Information Retrieval 2018 - Grenoble, France
    Duration: 26 Mar 201829 Mar 2018
    Conference number: 40

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume10772 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349


    Conference40th European Conference on Information Retrieval 2018
    Abbreviated titleECIR 2018
    Internet address


    • Collection-document summaries
    • Text summarization

    Cite this