A Linguistically Motivated Probabilistic Model of Information Retrieval

Djoerd Hiemstra

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

83 Citations (Scopus)
234 Downloads (Pure)

Abstract

This paper presents a new probabilistic model of information retrieval. The most important modeling assumption made is that documents and queries are defined by an ordered sequence of single terms. This assumption is not made in well known existing models of information retrieval, but is essential in the field of statistical natural language processing. Advances already made in statistical natural language processing will be used in this paper to formulate a probabilistic justification for using tf x idf term weighting. The paper shows that the new probabilistic interpretation of tf x idf term weighting might lead to better understanding of statistical ranking mechanisms, for example by explaining how they relate to coordination level ranking. A pilot experiment on the Cranfield test collection indicates that the presented model outperforms the vector space model with classical tf x idf and cosine length normalisation.
Original languageEnglish
Title of host publicationResearch and Advanced Technology for Digital Libraries
Subtitle of host publicationSecond European Conference, ECDL’98 Heraklion, Crete, Greece September 21–23, 1998 Proceedings
EditorsChristos Nikolaou, Constantine Stephanidis
Place of PublicationBerlin, Heidelberg
PublisherSpringer
Pages569-584
Number of pages16
ISBN (Electronic)978-3-540-49653-3
ISBN (Print)978-3-540-65101-7
DOIs
Publication statusPublished - 1998
Event2nd European Conference on Research and Advanced Technology for Digital Libraries, ECDL 1998 - Heraklion, Greece
Duration: 21 Sept 199823 Sept 1998
Conference number: 2

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume1513
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd European Conference on Research and Advanced Technology for Digital Libraries, ECDL 1998
Abbreviated titleECDL
Country/TerritoryGreece
CityHeraklion
Period21/09/9823/09/98

Keywords

  • HMI-MR: MULTIMEDIA RETRIEVAL
  • Statistical Natural Language Processing
  • Statistical Information Retrieval
  • Information Retrieval Theory
  • CR-H.3.3

Fingerprint

Dive into the research topics of 'A Linguistically Motivated Probabilistic Model of Information Retrieval'. Together they form a unique fingerprint.

Cite this