MIREX: MapReduce Information Retrieval Experiments

Djoerd Hiemstra, C. Hauff

Research output: Book/ReportReportProfessional

161 Downloads (Pure)

Abstract

We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://sourceforge.net/projects/mirex/
Original languageUndefined
Place of PublicationEnschede
PublisherCentre for Telematics and Information Technology (CTIT)
Number of pages7
Publication statusPublished - 14 Apr 2010

Publication series

NameCTIT Technical Report Series
PublisherCentre for Telematics and Information Technology, University of Twente
No.TR-CTIT-10-15
ISSN (Print)1381-3625

Keywords

  • METIS-270790
  • IR-71078
  • EWI-17797

Cite this