We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://sourceforge.net/projects/mirex/
|Place of Publication||Enschede|
|Publisher||Centre for Telematics and Information Technology (CTIT)|
|Number of pages||7|
|Publication status||Published - 14 Apr 2010|
|Name||CTIT Technical Report Series|
|Publisher||Centre for Telematics and Information Technology, University of Twente|
Hiemstra, D., & Hauff, C. (2010). MIREX: MapReduce Information Retrieval Experiments. (CTIT Technical Report Series; No. TR-CTIT-10-15). Enschede: Centre for Telematics and Information Technology (CTIT).