MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data"

Djoerd Hiemstra, C. Hauff

  • 4 Citations

Abstract

We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.net.
Original languageUndefined
Title of host publicationInternational Conference of the Cross-Language Evaluation Forum, CLEF: Multilingual and Multimodal Information Access Evaluation
EditorsMaristella Agosti, Nicola Ferro, Carol Peters, Maarten de Rijke, Alan Smeaton
Place of PublicationBerlin
PublisherSpringer Verlag
Pages64-69
Number of pages6
ISBN (Print)978-3-642-15997-8
DOIs
StatePublished - 2010

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Verlag
Volume6360

Fingerprint

Scanning
Information retrieval
Costs
Experiments

Keywords

  • IR-73226
  • EWI-18469
  • METIS-271032

Cite this

Hiemstra, D., & Hauff, C. (2010). MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data". In M. Agosti, N. Ferro, C. Peters, M. de Rijke, & A. Smeaton (Eds.), International Conference of the Cross-Language Evaluation Forum, CLEF: Multilingual and Multimodal Information Access Evaluation (pp. 64-69). (Lecture Notes in Computer Science; Vol. 6360). Berlin: Springer Verlag. DOI: 10.1007/978-3-642-15998-5_8

Hiemstra, Djoerd; Hauff, C. / MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data".

International Conference of the Cross-Language Evaluation Forum, CLEF: Multilingual and Multimodal Information Access Evaluation. ed. / Maristella Agosti; Nicola Ferro; Carol Peters; Maarten de Rijke; Alan Smeaton. Berlin : Springer Verlag, 2010. p. 64-69 (Lecture Notes in Computer Science; Vol. 6360).

Research output: Scientific - peer-reviewConference contribution

@inbook{3177e4975a8b403087874dcb9555a573,
title = "MapReduce for information retrieval evaluation: {"}Let's quickly test this on 12 TB of data{"}",
abstract = "We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.net.",
keywords = "IR-73226, EWI-18469, METIS-271032",
author = "Djoerd Hiemstra and C. Hauff",
note = "eemcs-eprint-18469",
year = "2010",
doi = "10.1007/978-3-642-15998-5_8",
isbn = "978-3-642-15997-8",
series = "Lecture Notes in Computer Science",
publisher = "Springer Verlag",
pages = "64--69",
editor = "Maristella Agosti and Nicola Ferro and Carol Peters and {de Rijke}, Maarten and Alan Smeaton",
booktitle = "International Conference of the Cross-Language Evaluation Forum, CLEF: Multilingual and Multimodal Information Access Evaluation",

}

Hiemstra, D & Hauff, C 2010, MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data". in M Agosti, N Ferro, C Peters, M de Rijke & A Smeaton (eds), International Conference of the Cross-Language Evaluation Forum, CLEF: Multilingual and Multimodal Information Access Evaluation. Lecture Notes in Computer Science, vol. 6360, Springer Verlag, Berlin, pp. 64-69. DOI: 10.1007/978-3-642-15998-5_8

MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data". / Hiemstra, Djoerd; Hauff, C.

International Conference of the Cross-Language Evaluation Forum, CLEF: Multilingual and Multimodal Information Access Evaluation. ed. / Maristella Agosti; Nicola Ferro; Carol Peters; Maarten de Rijke; Alan Smeaton. Berlin : Springer Verlag, 2010. p. 64-69 (Lecture Notes in Computer Science; Vol. 6360).

Research output: Scientific - peer-reviewConference contribution

TY - CHAP

T1 - MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data"

AU - Hiemstra,Djoerd

AU - Hauff,C.

N1 - eemcs-eprint-18469

PY - 2010

Y1 - 2010

N2 - We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.net.

AB - We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.net.

KW - IR-73226

KW - EWI-18469

KW - METIS-271032

U2 - 10.1007/978-3-642-15998-5_8

DO - 10.1007/978-3-642-15998-5_8

M3 - Conference contribution

SN - 978-3-642-15997-8

T3 - Lecture Notes in Computer Science

SP - 64

EP - 69

BT - International Conference of the Cross-Language Evaluation Forum, CLEF: Multilingual and Multimodal Information Access Evaluation

PB - Springer Verlag

ER -

Hiemstra D, Hauff C. MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data". In Agosti M, Ferro N, Peters C, de Rijke M, Smeaton A, editors, International Conference of the Cross-Language Evaluation Forum, CLEF: Multilingual and Multimodal Information Access Evaluation. Berlin: Springer Verlag. 2010. p. 64-69. (Lecture Notes in Computer Science). Available from, DOI: 10.1007/978-3-642-15998-5_8