From general to specialized domain: Analyzing three crucial problems of biomedical entity disambiguation

Stefan Zwicklbauer, Christin Seifert, Michael Granitzer

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    3 Citations (Scopus)

    Abstract

    Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. Most disambiguation systems focus on general purpose knowledge bases like DBpedia but leave out the question how those results generalize to more specialized domains. This is very important in the context of Linked Open Data, which forms an enormous resource for disambiguation. We implement a ranking-based (Learning To Rank) disambiguation system and provide a systematic evaluation of biomedical entity disambiguation with respect to three crucial and well-known properties of specialized disambiguation systems. These are (i) entity context, i.e. the way entities are described, (ii) user data, i.e. quantity and quality of externally disambiguated entities, and (iii) quantity and heterogeneity of entities to disambiguate, i.e. the number and size of different domains in a knowledge base. Our results show that (i) the choice of entity context that is used to attain the best disambiguation results strongly depends on the amount of available user data, (ii) disambiguation results with large-scale and heterogeneous knowledge bases strongly depend on the entity context, (iii) disambiguation results are robust against a moderate amount of noise in user data and (iv) some results can be significantly improved with a federated disambiguation approach that uses different entity contexts. Our results indicate that disambiguation systems must be carefully adapted when expanding their knowledge bases with special domain entities.
    Original languageEnglish
    Title of host publicationDatabase and Expert Systems Applications
    Subtitle of host publication26th International Conference, DEXA 2015, Valencia, Spain, September 1-4, 2015, Proceedings, Part I
    PublisherSpringer
    Pages76-93
    Number of pages18
    ISBN (Print)9783319228488
    DOIs
    Publication statusPublished - 2015
    Event26th International Conference on Database and Expert Systems Applications, DEXA 2015 - Valencia, Spain
    Duration: 1 Sep 20154 Sep 2015
    Conference number: 26
    http://www.dexa.org/previous/dexa2015/

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume9261

    Conference

    Conference26th International Conference on Database and Expert Systems Applications, DEXA 2015
    Abbreviated titleDEXA
    CountrySpain
    CityValencia
    Period1/09/154/09/15
    Internet address

    Keywords

    • Entity disambiguation
    • Learning to rank
    • Linked data
    • Semantic web

    Cite this

    Zwicklbauer, S., Seifert, C., & Granitzer, M. (2015). From general to specialized domain: Analyzing three crucial problems of biomedical entity disambiguation. In Database and Expert Systems Applications: 26th International Conference, DEXA 2015, Valencia, Spain, September 1-4, 2015, Proceedings, Part I (pp. 76-93). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9261). Springer. https://doi.org/10.1007/978-3-319-22849-5_6