Relating the new language models of information retrieval to the traditional retrieval models

Djoerd Hiemstra, A.P. de Vries

Research output: Book/ReportReportAcademic

20 Downloads (Pure)

Abstract

During the last two years, exciting new approaches to information retrieval were introduced by a number of different research groups that use statistical language models for retrieval. This paper relates the retrieval algorithms suggested by these approaches to widely accepted retrieval algorithms developed within three traditional models of information retrieval: the Boolean model, the vector space model and the probabilistic model. The paper shows the existence of efficient retrieval algorithms that only use the matching terms in their computation. Under these conditions, the language models of information retrieval are surprisingly similar to both tf.idf term weighting as developed for the vector space model and relevance weighting as developed in the traditional probabilistic model. The paper suggests a new method for relevance weighting and a new method to rank documents giving Boolean queries. Experimental results on the TREC collection indicate that the language modelling approach outperforms the three traditional approaches.
Original languageUndefined
PublisherUniversity of Twente
Number of pages14
Volume00
Publication statusPublished - Jun 2000

Publication series

NameCTIT Technical report series
No.00-09

Keywords

  • EWI-5950
  • METIS-118720
  • IR-18200

Cite this

Hiemstra, D., & de Vries, A. P. (2000). Relating the new language models of information retrieval to the traditional retrieval models. (CTIT Technical report series; No. 00-09). University of Twente.
Hiemstra, Djoerd ; de Vries, A.P. / Relating the new language models of information retrieval to the traditional retrieval models. University of Twente, 2000. 14 p. (CTIT Technical report series; 00-09).
@book{9eff184398994392a866f90ce8f00f71,
title = "Relating the new language models of information retrieval to the traditional retrieval models",
abstract = "During the last two years, exciting new approaches to information retrieval were introduced by a number of different research groups that use statistical language models for retrieval. This paper relates the retrieval algorithms suggested by these approaches to widely accepted retrieval algorithms developed within three traditional models of information retrieval: the Boolean model, the vector space model and the probabilistic model. The paper shows the existence of efficient retrieval algorithms that only use the matching terms in their computation. Under these conditions, the language models of information retrieval are surprisingly similar to both tf.idf term weighting as developed for the vector space model and relevance weighting as developed in the traditional probabilistic model. The paper suggests a new method for relevance weighting and a new method to rank documents giving Boolean queries. Experimental results on the TREC collection indicate that the language modelling approach outperforms the three traditional approaches.",
keywords = "EWI-5950, METIS-118720, IR-18200",
author = "Djoerd Hiemstra and {de Vries}, A.P.",
note = "Imported from CTIT",
year = "2000",
month = "6",
language = "Undefined",
volume = "00",
series = "CTIT Technical report series",
publisher = "University of Twente",
number = "00-09",
address = "Netherlands",

}

Hiemstra, D & de Vries, AP 2000, Relating the new language models of information retrieval to the traditional retrieval models. CTIT Technical report series, no. 00-09, vol. 00, University of Twente.

Relating the new language models of information retrieval to the traditional retrieval models. / Hiemstra, Djoerd; de Vries, A.P.

University of Twente, 2000. 14 p. (CTIT Technical report series; No. 00-09).

Research output: Book/ReportReportAcademic

TY - BOOK

T1 - Relating the new language models of information retrieval to the traditional retrieval models

AU - Hiemstra, Djoerd

AU - de Vries, A.P.

N1 - Imported from CTIT

PY - 2000/6

Y1 - 2000/6

N2 - During the last two years, exciting new approaches to information retrieval were introduced by a number of different research groups that use statistical language models for retrieval. This paper relates the retrieval algorithms suggested by these approaches to widely accepted retrieval algorithms developed within three traditional models of information retrieval: the Boolean model, the vector space model and the probabilistic model. The paper shows the existence of efficient retrieval algorithms that only use the matching terms in their computation. Under these conditions, the language models of information retrieval are surprisingly similar to both tf.idf term weighting as developed for the vector space model and relevance weighting as developed in the traditional probabilistic model. The paper suggests a new method for relevance weighting and a new method to rank documents giving Boolean queries. Experimental results on the TREC collection indicate that the language modelling approach outperforms the three traditional approaches.

AB - During the last two years, exciting new approaches to information retrieval were introduced by a number of different research groups that use statistical language models for retrieval. This paper relates the retrieval algorithms suggested by these approaches to widely accepted retrieval algorithms developed within three traditional models of information retrieval: the Boolean model, the vector space model and the probabilistic model. The paper shows the existence of efficient retrieval algorithms that only use the matching terms in their computation. Under these conditions, the language models of information retrieval are surprisingly similar to both tf.idf term weighting as developed for the vector space model and relevance weighting as developed in the traditional probabilistic model. The paper suggests a new method for relevance weighting and a new method to rank documents giving Boolean queries. Experimental results on the TREC collection indicate that the language modelling approach outperforms the three traditional approaches.

KW - EWI-5950

KW - METIS-118720

KW - IR-18200

M3 - Report

VL - 00

T3 - CTIT Technical report series

BT - Relating the new language models of information retrieval to the traditional retrieval models

PB - University of Twente

ER -

Hiemstra D, de Vries AP. Relating the new language models of information retrieval to the traditional retrieval models. University of Twente, 2000. 14 p. (CTIT Technical report series; 00-09).