MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data"

Djoerd Hiemstra, C. Hauff

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

6 Citations (Scopus)
77 Downloads (Pure)

Abstract

We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.net.
Original languageUndefined
Title of host publicationInternational Conference of the Cross-Language Evaluation Forum, CLEF: Multilingual and Multimodal Information Access Evaluation
EditorsMaristella Agosti, Nicola Ferro, Carol Peters, Maarten de Rijke, Alan Smeaton
Place of PublicationBerlin
PublisherSpringer
Pages64-69
Number of pages6
ISBN (Print)978-3-642-15997-8
DOIs
Publication statusPublished - 2010
EventCLEF (Cross-Language Evaluation Forum) Conference on Multilingual and Multimodal Information Access Evaluation 2010 - Padua, Italy
Duration: 20 Sep 201023 Sep 2010
http://clef2010.clef-initiative.eu/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Verlag
Volume6360

Conference

ConferenceCLEF (Cross-Language Evaluation Forum) Conference on Multilingual and Multimodal Information Access Evaluation 2010
Abbreviated titleCLEF
CountryItaly
CityPadua
Period20/09/1023/09/10
Internet address

Keywords

  • IR-73226
  • EWI-18469
  • METIS-271032

Cite this