A Hybrid System for German Encyclopedia Alignment

Roman Kern, Christin Seifert, Michael Granitzer

Research output: Contribution to journalArticleAcademicpeer-review

2 Citations (Scopus)

Abstract

Collaboratively created on-line encyclopedias have become increasingly popular. Especially in terms of completeness they have begun to surpass their printed counterparts. Two German publishers of traditional encyclopedias have reacted to this challenge and started an initiative to merge their corpora to create a single, more complete encyclopedia. The crucial step in this merging process is the alignment of articles. We have developed a two-step hybrid system to provide high-accurate alignments with low manual effort. First, we apply an information retrieval based, automatic alignment algorithm. Second, the articles with a low confidence score are revised using a manual alignment scheme carefully designed for quality assurance. Our evaluation shows that a combination of weighting and ranking techniques utilizing different facets of the encyclopedia articles allow to effectively reduce the number of necessary manual alignments. Further, the setup of the manual alignment turned out to be robust against inter-indexer inconsistencies. As a result, the developed system empowered us to align four encyclopedias with high accuracy and low effort.
Original languageEnglish
Pages (from-to)75–89
Number of pages15
JournalInternational journal on digital libraries
Volume11
Issue number2
DOIs
Publication statusPublished - 2011
Externally publishedYes

Keywords

  • Encyclopedia alignment
  • Semantic similarity
  • Hybrid alignment system

Fingerprint

Dive into the research topics of 'A Hybrid System for German Encyclopedia Alignment'. Together they form a unique fingerprint.

Cite this