Deriving a Bilingual Lexicon for Cross-Language Information Retrieval

Djoerd Hiemstra

Research output: Chapter in Book/Report/Conference proceedingConference contributionProfessional

31 Downloads (Pure)

Abstract

In this paper we describe a systematic approach to derive a bilingual lexicon automatically from parallel corpora. Following this approach, a lexicon was derived from the English and Dutch version of the Agenda 21 corpus. With the lexicon and a part of the corpus that was not used to derive the lexicon, a bilingual retrieval environment was build. Recall and precision of monolingual (Dutch) retrieval was compared to recall and precision of bilingual (Dutch-to-English) retrieval. An experiment was conducted with the help of eight naive users who formulated queries and judged the relevance of retrieved fragments. The experiment shows 78% precision and 51% relative recall of monolingual retrieval, against 67% precision and 82% relative recall of bilingual retrieval.
Original languageEnglish
Title of host publicationGRONICS '97
Subtitle of host publicationProceedings of the Fourth Groningen International Information Technology Conference for Students
EditorsM. Heemskerk, M. Diepenhorst
Place of PublicationGroningen
PublisherUniversity of Groningen
Pages21-26
Number of pages6
ISBN (Print)9789036707299
Publication statusPublished - 1997
EventFourth Groningen International Information Technology Conference for Students, GRONICS 1997 - Groningen, Netherlands
Duration: 21 Feb 199721 Feb 1997
Conference number: 4

Conference

ConferenceFourth Groningen International Information Technology Conference for Students, GRONICS 1997
Abbreviated titleGRONICS
Country/TerritoryNetherlands
CityGroningen
Period21/02/9721/02/97

Keywords

  • EWI-9494
  • IR-66999
  • METIS-122303
  • Lexical acquisition
  • Parallel corpora
  • Statistical Natural Language Processing
  • Cross Language Information Retrieval

Fingerprint

Dive into the research topics of 'Deriving a Bilingual Lexicon for Cross-Language Information Retrieval'. Together they form a unique fingerprint.

Cite this