Abstract
Automatically linking Wikipedia pages can be done either content based by exploiting word similarities or structure based by exploiting characteristics of the link graph. Our approach focuses on a content based strategy by detecting Wikipedia titles as link candidates and selecting the most relevant ones as links. The relevance calculation is based on the context, i.e. the surrounding text of a link candidate. Our goal was to evaluate the influence of the link-context on selecting relevant links and determining a links best-entry-point. Results show, that a whole Wikipedia page provides the best context for resolving link and that straight forward inverse document frequency based scoring of anchor texts achieves around 4% less Mean Average Precision on the provided data set. © 2009 Springer Berlin Heidelberg.
Original language | English |
---|---|
Title of host publication | Advances in Focused Retrieval |
Subtitle of host publication | 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, Dagstuhl Castle, Germany, December 15-18, 2008. Revised and Selected Papers |
Place of Publication | Berlin, Heidelberg |
Publisher | Springer |
Pages | 354-365 |
Number of pages | 12 |
ISBN (Electronic) | 978-3-642-03761-0 |
ISBN (Print) | 978-3-642-03760-3 |
DOIs | |
Publication status | Published - 2009 |
Externally published | Yes |
Event | 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008 - Dagstuhl, Germany Duration: 15 Dec 2008 → 18 Dec 2008 Conference number: 7 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 5631 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Workshop
Workshop | 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008 |
---|---|
Abbreviated title | INEX |
Country/Territory | Germany |
City | Dagstuhl |
Period | 15/12/08 → 18/12/08 |
Keywords
- Context Exploitation
- INEX
- Link-the-Wiki
- Proximity
- Suchmaschinen
- XML mining
- XML-Retrieval
- Classification
- Data mining
- Information retrieval
- Knowledge discovery
- Large sets
- p2p search
- Performance evaluation
- Similarity detection
- Self organizing