A Hybrid System for German Encyclopedia Alignment

Roman Kern, Christin Seifert, Michael Granitzer

    Research output: Contribution to journalArticleAcademicpeer-review

    2 Citations (Scopus)

    Abstract

    Collaboratively created on-line encyclopedias have become increasingly popular. Especially in terms of completeness they have begun to surpass their printed counterparts. Two German publishers of traditional encyclopedias have reacted to this challenge and started an initiative to merge their corpora to create a single, more complete encyclopedia. The crucial step in this merging process is the alignment of articles. We have developed a two-step hybrid system to provide high-accurate alignments with low manual effort. First, we apply an information retrieval based, automatic alignment algorithm. Second, the articles with a low confidence score are revised using a manual alignment scheme carefully designed for quality assurance. Our evaluation shows that a combination of weighting and ranking techniques utilizing different facets of the encyclopedia articles allow to effectively reduce the number of necessary manual alignments. Further, the setup of the manual alignment turned out to be robust against inter-indexer inconsistencies. As a result, the developed system empowered us to align four encyclopedias with high accuracy and low effort.
    Original languageEnglish
    Pages (from-to)75–89
    Number of pages15
    JournalInternational journal on digital libraries
    Volume11
    Issue number2
    DOIs
    Publication statusPublished - 2011

    Keywords

    • Encyclopedia alignment
    • Semantic similarity
    • Hybrid alignment system

    Fingerprint Dive into the research topics of 'A Hybrid System for German Encyclopedia Alignment'. Together they form a unique fingerprint.

    Cite this