Context Resolution Strategies for Automatic Wikipedia Learning

Michael Granitzer, Christin Seifert, Mario Zechner

Research output: Contribution to conferencePaperpeer-review

28 Downloads (Pure)

Abstract

Automatically linking Wikipedia pages is done mostly by two strategies: (i) a content based strategy based on word similarities or (ii) a structural similarity exploiting link characteristics. In our approach we focus on a content based strategy by finding anchors using the title of candidate Wikipedia pages and resolving matching links by taking the context of the link anchor, i.e. its surrounding text, into account. Bestentry-points are estimated on a combination of title and content based similarity. Our goal was to evaluate syntactic title matching properties and the influence of the context around anchors for disambiguation and best-entry-point detection. Results show, that the whole Wikipedia page provides the best context for resolving links and that simple inverse document frequency based scoring of anchor texts is also capable of achieving high accuracy.
Original languageEnglish
Number of pages12
Publication statusPublished - Dec 2008
Externally publishedYes
Event7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008 - Dagstuhl, Germany
Duration: 15 Dec 200818 Dec 2008
Conference number: 7

Workshop

Workshop7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008
Abbreviated titleINEX
Country/TerritoryGermany
CityDagstuhl
Period15/12/0818/12/08

Keywords

  • INEX
  • Link-the-Wiki
  • Content based approach
  • Similarity analysis

Fingerprint

Dive into the research topics of 'Context Resolution Strategies for Automatic Wikipedia Learning'. Together they form a unique fingerprint.

Cite this