Creating a Dutch Information Retrieval Test Corpus

Djoerd Hiemstra, David van Leeuwen

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

62 Downloads (Pure)

Abstract

This paper describes the first large-scale evaluation of information retrieval systems using Dutch documents and queries. We describe in detail the characteristics of the Dutch test data, which is part of the official CLEF multilingual test corpus, and give an overview of the experimental results of companies and research institutions that participated in the first official Dutch CLEF experiments. Judging from these experiments, the handling of languagespecific issues of Dutch, like for instance simple morphology and compound nouns, significantly improves the performance of information retrieval systems in many cases. Careful examination of the test collection shows that it serves as a reliable tool for the evaluation of information retrieval systems in the future.
Original languageEnglish
Title of host publicationComputational Linguistics in the Netherlands 2001
Subtitle of host publicationSelected Papers from the Twelfth CLIN Meeting
EditorsMariët Theune, Anton Nijholt, Hendri Hondorp
Place of PublicationAmsterdam, The Netherlands
PublisherRodopi
Pages133-147
Number of pages15
ISBN (Print)978-90-04-33403-8
DOIs
Publication statusPublished - 2002
Event12th Meeting on Computational Linguistics in the Netherlands, CLIN 2001 - University of Twente, Enschede, Netherlands
Duration: 30 Nov 200130 Nov 2001
Conference number: 12

Publication series

NameLanguage and Computers - Studies in Practical Linguistics
PublisherRodopi
Volume45

Conference

Conference12th Meeting on Computational Linguistics in the Netherlands, CLIN 2001
Abbreviated titleCLIN
CountryNetherlands
CityEnschede
Period30/11/0130/11/01
Other30 Nov 2001

Keywords

  • DB-IR: INFORMATION RETRIEVAL

Fingerprint Dive into the research topics of 'Creating a Dutch Information Retrieval Test Corpus'. Together they form a unique fingerprint.

  • Cite this

    Hiemstra, D., & van Leeuwen, D. (2002). Creating a Dutch Information Retrieval Test Corpus. In M. Theune, A. Nijholt, & H. Hondorp (Eds.), Computational Linguistics in the Netherlands 2001: Selected Papers from the Twelfth CLIN Meeting (pp. 133-147). (Language and Computers - Studies in Practical Linguistics; Vol. 45). Amsterdam, The Netherlands: Rodopi. https://doi.org/10.1163/9789004334038_012