Context Resolution Strategies for Automatic Wikipedia Learning

Michael Granitzer, Christin Seifert, Mario Zechner

    Research output: Contribution to conferencePaperAcademicpeer-review

    11 Downloads (Pure)

    Abstract

    Automatically linking Wikipedia pages is done mostly by two strategies: (i) a content based strategy based on word similarities or (ii) a structural similarity exploiting link characteristics. In our approach we focus on a content based strategy by finding anchors using the title of candidate Wikipedia pages and resolving matching links by taking the context of the link anchor, i.e. its surrounding text, into account. Bestentry-points are estimated on a combination of title and content based similarity. Our goal was to evaluate syntactic title matching properties and the influence of the context around anchors for disambiguation and best-entry-point detection. Results show, that the whole Wikipedia page provides the best context for resolving links and that simple inverse document frequency based scoring of anchor texts is also capable of achieving high accuracy.
    Original languageEnglish
    Number of pages12
    Publication statusPublished - Dec 2008
    Event7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008 - Dagstuhl, Germany
    Duration: 15 Dec 200818 Dec 2008
    Conference number: 7

    Workshop

    Workshop7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008
    Abbreviated titleINEX
    CountryGermany
    CityDagstuhl
    Period15/12/0818/12/08

    Keywords

    • INEX
    • Link-the-Wiki
    • Content based approach
    • Similarity analysis

    Fingerprint Dive into the research topics of 'Context Resolution Strategies for Automatic Wikipedia Learning'. Together they form a unique fingerprint.

    Cite this