Towards a Better Understanding of the Relationship Between Probabilistic Models in IR

Robin Aly, Thomas Demeester

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)
31 Downloads (Pure)

Abstract

Probability of relevance (PR) models are generally assumed to implement the Probability Ranking Principle (PRP) of IR, and recent publications claim that PR models and language models are similar. However, a careful analysis reveals two gaps in the chain of reasoning behind this statement. First, the PRP considers the relevance of particular documents, whereas PR models consider the relevance of any query-document pair. Second, unlike PR models, language models consider draws of terms and documents. We bridge the first gap by showing how the probability measure of PR models can be used to define the probabilistic model of the PRP. Furthermore, we argue that given the differences between PR models and language models, the second gap cannot be bridged at the probabilistic model level. We instead define a new PR model based on logistic regression, which has a similar score function to the one of the query likelihood model. The performance of both models is strongly correlated, hence providing a bridge for the second gap at the functional and ranking level. Understanding language models in relation with logistic regression models opens ample new research directions which we propose as future work.
Original languageEnglish
Title of host publicationAdvances in Information Retrieval Theory
Subtitle of host publicationThird International Conference, ICTIR 2011, Bertinoro, Italy, September 12-14, 2011. Proceedings
EditorsGiambattista Amati, Fabio Crestani
Place of PublicationBerlin, Heidelberg
PublisherSpringer
Pages164-175
Number of pages12
ISBN (Electronic)978-3-642-23318-0
ISBN (Print)978-3-642-23317-3
DOIs
Publication statusPublished - Sep 2011
Event3rd International Conference on Advances in Information Retrieval Theory, ICTIR 2011 - Bertinoro, Italy
Duration: 12 Sep 201114 Sep 2011
Conference number: 3

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Verlag
Volume6931
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd International Conference on Advances in Information Retrieval Theory, ICTIR 2011
Abbreviated titleICTIR
CountryItaly
CityBertinoro
Period12/09/1114/09/11

Fingerprint

Statistical Models
Logistics

Keywords

  • METIS-278719
  • EWI-20202
  • IR-78116

Cite this

Aly, R., & Demeester, T. (2011). Towards a Better Understanding of the Relationship Between Probabilistic Models in IR. In G. Amati, & F. Crestani (Eds.), Advances in Information Retrieval Theory: Third International Conference, ICTIR 2011, Bertinoro, Italy, September 12-14, 2011. Proceedings (pp. 164-175). (Lecture Notes in Computer Science; Vol. 6931). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-23318-0_16
Aly, Robin ; Demeester, Thomas. / Towards a Better Understanding of the Relationship Between Probabilistic Models in IR. Advances in Information Retrieval Theory: Third International Conference, ICTIR 2011, Bertinoro, Italy, September 12-14, 2011. Proceedings. editor / Giambattista Amati ; Fabio Crestani. Berlin, Heidelberg : Springer, 2011. pp. 164-175 (Lecture Notes in Computer Science).
@inproceedings{192838b95371497aa3bc4eb764a97745,
title = "Towards a Better Understanding of the Relationship Between Probabilistic Models in IR",
abstract = "Probability of relevance (PR) models are generally assumed to implement the Probability Ranking Principle (PRP) of IR, and recent publications claim that PR models and language models are similar. However, a careful analysis reveals two gaps in the chain of reasoning behind this statement. First, the PRP considers the relevance of particular documents, whereas PR models consider the relevance of any query-document pair. Second, unlike PR models, language models consider draws of terms and documents. We bridge the first gap by showing how the probability measure of PR models can be used to define the probabilistic model of the PRP. Furthermore, we argue that given the differences between PR models and language models, the second gap cannot be bridged at the probabilistic model level. We instead define a new PR model based on logistic regression, which has a similar score function to the one of the query likelihood model. The performance of both models is strongly correlated, hence providing a bridge for the second gap at the functional and ranking level. Understanding language models in relation with logistic regression models opens ample new research directions which we propose as future work.",
keywords = "METIS-278719, EWI-20202, IR-78116",
author = "Robin Aly and Thomas Demeester",
year = "2011",
month = "9",
doi = "10.1007/978-3-642-23318-0_16",
language = "English",
isbn = "978-3-642-23317-3",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "164--175",
editor = "Giambattista Amati and Fabio Crestani",
booktitle = "Advances in Information Retrieval Theory",

}

Aly, R & Demeester, T 2011, Towards a Better Understanding of the Relationship Between Probabilistic Models in IR. in G Amati & F Crestani (eds), Advances in Information Retrieval Theory: Third International Conference, ICTIR 2011, Bertinoro, Italy, September 12-14, 2011. Proceedings. Lecture Notes in Computer Science, vol. 6931, Springer, Berlin, Heidelberg, pp. 164-175, 3rd International Conference on Advances in Information Retrieval Theory, ICTIR 2011, Bertinoro, Italy, 12/09/11. https://doi.org/10.1007/978-3-642-23318-0_16

Towards a Better Understanding of the Relationship Between Probabilistic Models in IR. / Aly, Robin; Demeester, Thomas.

Advances in Information Retrieval Theory: Third International Conference, ICTIR 2011, Bertinoro, Italy, September 12-14, 2011. Proceedings. ed. / Giambattista Amati; Fabio Crestani. Berlin, Heidelberg : Springer, 2011. p. 164-175 (Lecture Notes in Computer Science; Vol. 6931).

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Towards a Better Understanding of the Relationship Between Probabilistic Models in IR

AU - Aly, Robin

AU - Demeester, Thomas

PY - 2011/9

Y1 - 2011/9

N2 - Probability of relevance (PR) models are generally assumed to implement the Probability Ranking Principle (PRP) of IR, and recent publications claim that PR models and language models are similar. However, a careful analysis reveals two gaps in the chain of reasoning behind this statement. First, the PRP considers the relevance of particular documents, whereas PR models consider the relevance of any query-document pair. Second, unlike PR models, language models consider draws of terms and documents. We bridge the first gap by showing how the probability measure of PR models can be used to define the probabilistic model of the PRP. Furthermore, we argue that given the differences between PR models and language models, the second gap cannot be bridged at the probabilistic model level. We instead define a new PR model based on logistic regression, which has a similar score function to the one of the query likelihood model. The performance of both models is strongly correlated, hence providing a bridge for the second gap at the functional and ranking level. Understanding language models in relation with logistic regression models opens ample new research directions which we propose as future work.

AB - Probability of relevance (PR) models are generally assumed to implement the Probability Ranking Principle (PRP) of IR, and recent publications claim that PR models and language models are similar. However, a careful analysis reveals two gaps in the chain of reasoning behind this statement. First, the PRP considers the relevance of particular documents, whereas PR models consider the relevance of any query-document pair. Second, unlike PR models, language models consider draws of terms and documents. We bridge the first gap by showing how the probability measure of PR models can be used to define the probabilistic model of the PRP. Furthermore, we argue that given the differences between PR models and language models, the second gap cannot be bridged at the probabilistic model level. We instead define a new PR model based on logistic regression, which has a similar score function to the one of the query likelihood model. The performance of both models is strongly correlated, hence providing a bridge for the second gap at the functional and ranking level. Understanding language models in relation with logistic regression models opens ample new research directions which we propose as future work.

KW - METIS-278719

KW - EWI-20202

KW - IR-78116

U2 - 10.1007/978-3-642-23318-0_16

DO - 10.1007/978-3-642-23318-0_16

M3 - Conference contribution

SN - 978-3-642-23317-3

T3 - Lecture Notes in Computer Science

SP - 164

EP - 175

BT - Advances in Information Retrieval Theory

A2 - Amati, Giambattista

A2 - Crestani, Fabio

PB - Springer

CY - Berlin, Heidelberg

ER -

Aly R, Demeester T. Towards a Better Understanding of the Relationship Between Probabilistic Models in IR. In Amati G, Crestani F, editors, Advances in Information Retrieval Theory: Third International Conference, ICTIR 2011, Bertinoro, Italy, September 12-14, 2011. Proceedings. Berlin, Heidelberg: Springer. 2011. p. 164-175. (Lecture Notes in Computer Science). https://doi.org/10.1007/978-3-642-23318-0_16