Abstract
We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.net.
Original language | Undefined |
---|---|
Title of host publication | International Conference of the Cross-Language Evaluation Forum, CLEF: Multilingual and Multimodal Information Access Evaluation |
Editors | Maristella Agosti, Nicola Ferro, Carol Peters, Maarten de Rijke, Alan Smeaton |
Place of Publication | Berlin |
Publisher | Springer |
Pages | 64-69 |
Number of pages | 6 |
ISBN (Print) | 978-3-642-15997-8 |
DOIs | |
Publication status | Published - 2010 |
Event | CLEF (Cross-Language Evaluation Forum) Conference on Multilingual and Multimodal Information Access Evaluation 2010 - Padua, Italy Duration: 20 Sept 2010 → 23 Sept 2010 http://clef2010.clef-initiative.eu/ |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer Verlag |
Volume | 6360 |
Conference
Conference | CLEF (Cross-Language Evaluation Forum) Conference on Multilingual and Multimodal Information Access Evaluation 2010 |
---|---|
Abbreviated title | CLEF |
Country/Territory | Italy |
City | Padua |
Period | 20/09/10 → 23/09/10 |
Internet address |
Keywords
- IR-73226
- EWI-18469
- METIS-271032