Abstract
We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.net.
| Original language | Undefined |
|---|---|
| Title of host publication | International Conference of the Cross-Language Evaluation Forum, CLEF: Multilingual and Multimodal Information Access Evaluation |
| Editors | Maristella Agosti, Nicola Ferro, Carol Peters, Maarten de Rijke, Alan Smeaton |
| Place of Publication | Berlin |
| Publisher | Springer |
| Pages | 64-69 |
| Number of pages | 6 |
| ISBN (Print) | 978-3-642-15997-8 |
| DOIs | |
| Publication status | Published - 2010 |
| Event | CLEF (Cross-Language Evaluation Forum) Conference on Multilingual and Multimodal Information Access Evaluation 2010 - Padua, Italy Duration: 20 Sept 2010 → 23 Sept 2010 http://clef2010.clef-initiative.eu/ |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Publisher | Springer Verlag |
| Volume | 6360 |
Conference
| Conference | CLEF (Cross-Language Evaluation Forum) Conference on Multilingual and Multimodal Information Access Evaluation 2010 |
|---|---|
| Abbreviated title | CLEF |
| Country/Territory | Italy |
| City | Padua |
| Period | 20/09/10 → 23/09/10 |
| Internet address |
Keywords
- IR-73226
- EWI-18469
- METIS-271032