Collection-document summaries

Nils Witt, Michael Granitzer, Christin Seifert

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    39 Downloads (Pure)

    Abstract

    Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth.
    Original languageEnglish
    Title of host publicationAdvances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Proceedings
    Subtitle of host publication40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018, Proceedings
    EditorsGabriella Pasi, Benjamin Piwowarski, Leif Azzopardi, Allan Hanbury
    PublisherSpringer
    Pages638-643
    Number of pages6
    ISBN (Electronic)978-3-319-76941-7
    ISBN (Print)978-3-319-76940-0
    DOIs
    Publication statusPublished - 1 Jan 2018
    Event40th European Conference on Information Retrieval 2018 - Grenoble, France
    Duration: 26 Mar 201829 Mar 2018
    Conference number: 40
    http://www.ecir2018.org/

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume10772 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference40th European Conference on Information Retrieval 2018
    Abbreviated titleECIR 2018
    CountryFrance
    CityGrenoble
    Period26/03/1829/03/18
    Internet address

    Keywords

    • Collection-document summaries
    • Text summarization

    Cite this

    Witt, N., Granitzer, M., & Seifert, C. (2018). Collection-document summaries. In G. Pasi, B. Piwowarski, L. Azzopardi, & A. Hanbury (Eds.), Advances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Proceedings: 40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018, Proceedings (pp. 638-643). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10772 LNCS). Springer. https://doi.org/10.1007/978-3-319-76941-7_56
    Witt, Nils ; Granitzer, Michael ; Seifert, Christin. / Collection-document summaries. Advances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Proceedings: 40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018, Proceedings. editor / Gabriella Pasi ; Benjamin Piwowarski ; Leif Azzopardi ; Allan Hanbury. Springer, 2018. pp. 638-643 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{fad757bbd93e49a983038c4e527b97cf,
    title = "Collection-document summaries",
    abstract = "Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth.",
    keywords = "Collection-document summaries, Text summarization",
    author = "Nils Witt and Michael Granitzer and Christin Seifert",
    year = "2018",
    month = "1",
    day = "1",
    doi = "10.1007/978-3-319-76941-7_56",
    language = "English",
    isbn = "978-3-319-76940-0",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer",
    pages = "638--643",
    editor = "Gabriella Pasi and Benjamin Piwowarski and Leif Azzopardi and Allan Hanbury",
    booktitle = "Advances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Proceedings",

    }

    Witt, N, Granitzer, M & Seifert, C 2018, Collection-document summaries. in G Pasi, B Piwowarski, L Azzopardi & A Hanbury (eds), Advances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Proceedings: 40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10772 LNCS, Springer, pp. 638-643, 40th European Conference on Information Retrieval 2018, Grenoble, France, 26/03/18. https://doi.org/10.1007/978-3-319-76941-7_56

    Collection-document summaries. / Witt, Nils; Granitzer, Michael; Seifert, Christin.

    Advances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Proceedings: 40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018, Proceedings. ed. / Gabriella Pasi; Benjamin Piwowarski; Leif Azzopardi; Allan Hanbury. Springer, 2018. p. 638-643 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10772 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    TY - GEN

    T1 - Collection-document summaries

    AU - Witt, Nils

    AU - Granitzer, Michael

    AU - Seifert, Christin

    PY - 2018/1/1

    Y1 - 2018/1/1

    N2 - Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth.

    AB - Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth.

    KW - Collection-document summaries

    KW - Text summarization

    U2 - 10.1007/978-3-319-76941-7_56

    DO - 10.1007/978-3-319-76941-7_56

    M3 - Conference contribution

    SN - 978-3-319-76940-0

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 638

    EP - 643

    BT - Advances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Proceedings

    A2 - Pasi, Gabriella

    A2 - Piwowarski, Benjamin

    A2 - Azzopardi, Leif

    A2 - Hanbury, Allan

    PB - Springer

    ER -

    Witt N, Granitzer M, Seifert C. Collection-document summaries. In Pasi G, Piwowarski B, Azzopardi L, Hanbury A, editors, Advances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Proceedings: 40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018, Proceedings. Springer. 2018. p. 638-643. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-76941-7_56