Abstract
In this paper we describe a systematic approach to derive a bilingual lexicon automatically from parallel corpora. Following this approach, a lexicon was derived from the English and Dutch version of the Agenda 21 corpus. With the lexicon and a part of the corpus that was not used to derive the lexicon, a bilingual retrieval environment was build. Recall and precision of monolingual (Dutch) retrieval was compared to recall and precision of bilingual (Dutch-to-English) retrieval. An experiment was conducted with the help of eight naive users who formulated queries and judged the relevance of retrieved fragments. The experiment shows 78% precision and 51% relative recall of monolingual retrieval, against 67% precision and 82% relative recall of bilingual retrieval.
Original language | English |
---|---|
Title of host publication | GRONICS '97 |
Subtitle of host publication | Proceedings of the Fourth Groningen International Information Technology Conference for Students |
Editors | M. Heemskerk, M. Diepenhorst |
Place of Publication | Groningen |
Publisher | University of Groningen |
Pages | 21-26 |
Number of pages | 6 |
ISBN (Print) | 9789036707299 |
Publication status | Published - 1997 |
Event | Fourth Groningen International Information Technology Conference for Students, GRONICS 1997 - Groningen, Netherlands Duration: 21 Feb 1997 → 21 Feb 1997 Conference number: 4 |
Conference
Conference | Fourth Groningen International Information Technology Conference for Students, GRONICS 1997 |
---|---|
Abbreviated title | GRONICS |
Country/Territory | Netherlands |
City | Groningen |
Period | 21/02/97 → 21/02/97 |
Keywords
- EWI-9494
- IR-66999
- METIS-122303
- Lexical acquisition
- Parallel corpora
- Statistical Natural Language Processing
- Cross Language Information Retrieval