A hybrid approach for robust multilingual toponym extraction and disambiguation

Mena Badieh Habib, Maurice van Keulen

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

5 Citations (Scopus)
128 Downloads (Pure)

Abstract

Toponym extraction and disambiguation are key topics recently addressed by fields of Information Extraction and Geographical Information Retrieval. Toponym extraction and disambiguation are highly dependent processes. Not only toponym extraction effectiveness affects disambiguation, but also disambiguation results may help improving extraction accuracy. In this paper we propose a hybrid toponym extraction approach based on Hidden Markov Models (HMM) and Support Vector Machines (SVM). Hidden Markov Model is used for extraction with high recall and low precision. Then SVM is used to find false positives based on informativeness features and coherence features derived from the disambiguation results. Experimental results conducted with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms showed that the proposed approach outperform the state of the art methods of extraction and also proved to be robust. \mbh{Robustness is proved on three aspects: language independence, high and low HMM threshold settings, and limited training data.
Original languageUndefined
Title of host publicationProceedings of the International Conference on Language Processing and Intelligent Information Systems (LP&IIS 2013)
Place of PublicationBerlin
PublisherSpringer
Pages1-15
Number of pages15
ISBN (Print)978-3-642-38633-6
DOIs
Publication statusPublished - Jun 2013
EventInternational Conference on Language Processing and Intelligent Information Systems, LP&IIS 2013 - Warsaw, Poland
Duration: 17 Jun 201318 Jun 2013

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Verlag
Volume7912

Conference

ConferenceInternational Conference on Language Processing and Intelligent Information Systems, LP&IIS 2013
Period17/06/1318/06/13
Other17-18 June 2013

Keywords

  • EWI-23152
  • Named Entity RecognitionNamed Entity LinkingNamed Entity ExtractionNamed Entity DisambiguationToponym ExtractionToponyms DisambiguationHybrid SystemMultilingual Extraction and Disambiguation
  • Toponym Extraction
  • IR-84626
  • Hybrid System
  • Multilingual Extraction and Disambiguation
  • METIS-296346
  • Toponyms Disambiguation

Cite this