Abstract
This paper presents a new probabilistic model of information retrieval. The most important modeling assumption made is that documents and queries are defined by an ordered sequence of single terms. This assumption is not made in well known existing models of information retrieval, but is essential in the field of statistical natural language processing. Advances already made in statistical natural language processing will be used in this paper to formulate a probabilistic justification for using tf x idf term weighting. The paper shows that the new probabilistic interpretation of tf x idf term weighting might lead to better understanding of statistical ranking mechanisms, for example by explaining how they relate to coordination level ranking. A pilot experiment on the Cranfield test collection indicates that the presented model outperforms the vector space model with classical tf x idf and cosine length normalisation.
Original language | English |
---|---|
Title of host publication | Research and Advanced Technology for Digital Libraries |
Subtitle of host publication | Second European Conference, ECDL’98 Heraklion, Crete, Greece September 21–23, 1998 Proceedings |
Editors | Christos Nikolaou, Constantine Stephanidis |
Place of Publication | Berlin, Heidelberg |
Publisher | Springer |
Pages | 569-584 |
Number of pages | 16 |
ISBN (Electronic) | 978-3-540-49653-3 |
ISBN (Print) | 978-3-540-65101-7 |
DOIs | |
Publication status | Published - 1998 |
Event | 2nd European Conference on Research and Advanced Technology for Digital Libraries, ECDL 1998 - Heraklion, Greece Duration: 21 Sept 1998 → 23 Sept 1998 Conference number: 2 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 1513 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 2nd European Conference on Research and Advanced Technology for Digital Libraries, ECDL 1998 |
---|---|
Abbreviated title | ECDL |
Country/Territory | Greece |
City | Heraklion |
Period | 21/09/98 → 23/09/98 |
Keywords
- HMI-MR: MULTIMEDIA RETRIEVAL
- Statistical Natural Language Processing
- Statistical Information Retrieval
- Information Retrieval Theory
- CR-H.3.3