Abstract

We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://sourceforge.net/projects/mirex/
Original languageUndefined
Place of PublicationEnschede
PublisherCentre for Telematics and Information Technology (CTIT)
Number of pages7
StatePublished - 14 Apr 2010

Publication series

NameCTIT Technical Report Series
PublisherCentre for Telematics and Information Technology, University of Twente
No.TR-CTIT-10-15
ISSN (Print)1381-3625

Fingerprint

Scanning
Information retrieval
Costs
Experiments

Keywords

  • METIS-270790
  • IR-71078
  • EWI-17797

Cite this

Hiemstra, D., & Hauff, C. (2010). MIREX: MapReduce Information Retrieval Experiments. (CTIT Technical Report Series; No. TR-CTIT-10-15). Enschede: Centre for Telematics and Information Technology (CTIT).

Hiemstra, Djoerd; Hauff, C. / MIREX: MapReduce Information Retrieval Experiments.

Enschede : Centre for Telematics and Information Technology (CTIT), 2010. 7 p. (CTIT Technical Report Series; No. TR-CTIT-10-15).

Research output: ProfessionalReport

@book{cd208a1a515c4aebacff9d59101a4452,
title = "MIREX: MapReduce Information Retrieval Experiments",
abstract = "We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://sourceforge.net/projects/mirex/",
keywords = "METIS-270790, IR-71078, EWI-17797",
author = "Djoerd Hiemstra and C. Hauff",
year = "2010",
month = "4",
series = "CTIT Technical Report Series",
publisher = "Centre for Telematics and Information Technology (CTIT)",
number = "TR-CTIT-10-15",
address = "Netherlands",

}

Hiemstra, D & Hauff, C 2010, MIREX: MapReduce Information Retrieval Experiments. CTIT Technical Report Series, no. TR-CTIT-10-15, Centre for Telematics and Information Technology (CTIT), Enschede.

MIREX: MapReduce Information Retrieval Experiments. / Hiemstra, Djoerd; Hauff, C.

Enschede : Centre for Telematics and Information Technology (CTIT), 2010. 7 p. (CTIT Technical Report Series; No. TR-CTIT-10-15).

Research output: ProfessionalReport

TY - BOOK

T1 - MIREX: MapReduce Information Retrieval Experiments

AU - Hiemstra,Djoerd

AU - Hauff,C.

PY - 2010/4/14

Y1 - 2010/4/14

N2 - We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://sourceforge.net/projects/mirex/

AB - We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://sourceforge.net/projects/mirex/

KW - METIS-270790

KW - IR-71078

KW - EWI-17797

M3 - Report

T3 - CTIT Technical Report Series

BT - MIREX: MapReduce Information Retrieval Experiments

PB - Centre for Telematics and Information Technology (CTIT)

ER -

Hiemstra D, Hauff C. MIREX: MapReduce Information Retrieval Experiments. Enschede: Centre for Telematics and Information Technology (CTIT), 2010. 7 p. (CTIT Technical Report Series; TR-CTIT-10-15).