The main obstacle for providing focused search is the relative opaqueness of search request -- searchers tend to express their complex information needs in only a couple of keywords. Our overall aim is to find out if, and how, topic-based language models can lead to more effective web information retrieval. In this paper we explore retrieval performance of a topic-based model that combines topical models with other language models based on cross-entropy. We first define our topical categories and train our topical models on the .GOV2 corpus by building parsimonious language models. We then test the topic-based model on TREC8 small Web data collection for ad-hoc search.Our experimental results show that the topic-based model outperforms the standard language model and parsimonious model.
|Title of host publication||Proceedings of the Dutch-Belgian Information Retrieval Workshop (DIR 2008)|
|Editors||E. Hoenkamp, M. De Cock, V. Hoste|
|Place of Publication||Enschede|
|Number of pages||7|
|Publication status||Published - 14 Apr 2008|
|Event||8th Dutch-Belgian Information Retrieval Workshop, DIR 2008 - Maastricht, Netherlands|
Duration: 14 Apr 2008 → 15 Apr 2008
Conference number: 8
|Conference||8th Dutch-Belgian Information Retrieval Workshop, DIR 2008|
|Period||14/04/08 → 15/04/08|
- DB-IR: INFORMATION RETRIEVAL
Li, R., Kaptein, R., Hiemstra, D., & Kamps, J. (2008). Exploring Topic-based Language Models for Effective Web Information Retrieval. In E. Hoenkamp, M. De Cock, & V. Hoste (Eds.), Proceedings of the Dutch-Belgian Information Retrieval Workshop (DIR 2008) (pp. 65-71). Enschede: Neslia Paniculata.