Most Important First – Keyphrase Scoring for Improved Ranking in Settings With Limited Keyphrases

Nils Witt, Michael Granitzer, Christin Seifert

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    1 Citation (Scopus)
    169 Downloads (Pure)

    Abstract

    Automatic keyphrase extraction attempts to capture keywords that accurately and extensively describe the document while being comprehensive at the same time. Unsupervised algorithms for extractive keyphrase extraction, i.e. those that filter the keyphrases from the text without external knowledge, generally suffer from low precision and low recall. In this paper, we propose a scoring of the extracted keyphrases as post-processing to rerank the list of extracted phrases in order to improve precision and recall particularly for the top phrases. The approach is based on the tf-idf score of the keyphrases and is agnostic of the underlying method used for the initial extraction of the keyphrases. Experiments show an increase of up to 14% at 5 keyphrases in the F1-metric on the most difficult corpus out of 4 corpora. We also show that this increase is mostly due to an increase on documents with very low F1-scores. Thus, our scoring and aggregation approach seems to be a promising way for robust, unsupervised keyphrase extraction with a special focus on the most important keyphrases.
    Original languageEnglish
    Title of host publicationDiscovery Science
    Subtitle of host publication21st International Conference, DS 2018, Limassol, Cyprus, October 29–31, 2018, Proceedings
    EditorsLarisa Soldatova, Joaquin Vanschoren, George Papadopoulos, Michelangelo Ceci
    PublisherSpringer
    Pages373-385
    Number of pages13
    ISBN (Electronic)978-3-030-01771-2
    ISBN (Print)978-3-030-01770-5
    DOIs
    Publication statusPublished - 7 Oct 2018
    Event21st International Conference on Discovery Science 2018 - St. Raphael Resort, Limassol, Cyprus
    Duration: 29 Oct 201831 Oct 2018
    Conference number: 21
    http://www.cyprusconferences.org/ds2018/

    Publication series

    NameLecture notes in computer science
    Volume11198

    Conference

    Conference21st International Conference on Discovery Science 2018
    Abbreviated titleDS 2018
    CountryCyprus
    CityLimassol
    Period29/10/1831/10/18
    Internet address

    Fingerprint Dive into the research topics of 'Most Important First – Keyphrase Scoring for Improved Ranking in Settings With Limited Keyphrases'. Together they form a unique fingerprint.

    Cite this