A hybrid approach for robust multilingual toponym extraction and disambiguation

Mena Badieh Habib, Maurice van Keulen

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)
82 Downloads (Pure)

Abstract

Toponym extraction and disambiguation are key topics recently addressed by fields of Information Extraction and Geographical Information Retrieval. Toponym extraction and disambiguation are highly dependent processes. Not only toponym extraction effectiveness affects disambiguation, but also disambiguation results may help improving extraction accuracy. In this paper we propose a hybrid toponym extraction approach based on Hidden Markov Models (HMM) and Support Vector Machines (SVM). Hidden Markov Model is used for extraction with high recall and low precision. Then SVM is used to find false positives based on informativeness features and coherence features derived from the disambiguation results. Experimental results conducted with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms showed that the proposed approach outperform the state of the art methods of extraction and also proved to be robust. \mbh{Robustness is proved on three aspects: language independence, high and low HMM threshold settings, and limited training data.
Original languageUndefined
Title of host publicationProceedings of the International Conference on Language Processing and Intelligent Information Systems (LP&IIS 2013)
Place of PublicationBerlin
PublisherSpringer
Pages1-15
Number of pages15
ISBN (Print)978-3-642-38633-6
DOIs
Publication statusPublished - Jun 2013

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Verlag
Volume7912

Keywords

  • EWI-23152
  • Named Entity RecognitionNamed Entity LinkingNamed Entity ExtractionNamed Entity DisambiguationToponym ExtractionToponyms DisambiguationHybrid SystemMultilingual Extraction and Disambiguation
  • Toponym Extraction
  • IR-84626
  • Hybrid System
  • Multilingual Extraction and Disambiguation
  • METIS-296346
  • Toponyms Disambiguation

Cite this

Habib, M. B., & van Keulen, M. (2013). A hybrid approach for robust multilingual toponym extraction and disambiguation. In Proceedings of the International Conference on Language Processing and Intelligent Information Systems (LP&IIS 2013) (pp. 1-15). (Lecture Notes in Computer Science; Vol. 7912). Berlin: Springer. https://doi.org/10.1007/978-3-642-38634-3_1