Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus

Djoerd Hiemstra

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

66 Downloads (Pure)

Abstract

Within the project Twenty-One, which aims at the effective dissemination of information on ecology and sustainable development, a sytem is developed that supports cross-language information retrieval in any of the four languages Dutch, English, French and German. Knowledge of this application domain is needed to enhance existing translation resources for the purpose of lexical disambiguation. This paper describes an algorithm for the automated acquisition of a translation lexicon from a parallel corpus. New about the presented algorithm is the statistical language model used. Because the algorithm is based on a symmetric translation model it becomes possible to identify one-to-many and many-to-one relations between words of a language pair. We claim that the presented method has two advantages over algorithms that have been published before. Firstly, because the translation model is more powerful, the resulting bilingual lexicon will be more accurate. Secondly, the resulting bilingual lexicon can be used to translate in both directions between a language pair. Different versions of the algorithm were evaluated on the Dutch and English version of the Agenda 21 corpus, which is a UN document on the application domain of sustainable development.
Original languageEnglish
Title of host publicationComputational Linguistics in the Netherlands 1997
Subtitle of host publicationselected papers from the eighth CLIN Meeting
EditorsPeter-Arno Coppen, Hans van Halteren, Lisanne Teunissen
PublisherRodopi
Pages41-58
Number of pages18
ISBN (Print)9789042005044
Publication statusPublished - 1998
Event8th Meeting on Computational Linguistics in the Netherlands, CLIN 1997 - Katholieke Universiteit Nijmegen, Nijmegen
Duration: 1 Dec 19971 Dec 1997
Conference number: 8
http://odur.let.rug.nl/~vannoord/clin/nijmegen97.html

Publication series

NameLanguage and computers
PublisherRodopi
Number25

Workshop

Workshop8th Meeting on Computational Linguistics in the Netherlands, CLIN 1997
Abbreviated titleCLIN
CityNijmegen
Period1/12/971/12/97
Internet address

Keywords

  • CR-I.2.7

Fingerprint

Dive into the research topics of 'Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus'. Together they form a unique fingerprint.

Cite this