Predicting the cost-quality trade-off for information retrieval queries: Facilitating database design and query optimization

Henk Ernst Blok, Djoerd Hiemstra, Sunil Choenni, Franciska de Jong, Henk M. Blanken, Peter M.G. Apers

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

8 Citations (Scopus)
10 Downloads (Pure)

Abstract

Efficient, flexible, and scalable integration of full text information retrieval (IR) in a DBMS is not a trivial case. This holds in particular for query optimization in such a context. To facilitate the bulk-oriented behavior of database query processing, a priori knowledge of how to limit the data efficiently prior to query evaluation is very valuable at optimization time. The usually imprecise nature of IR querying provides an extra opportunity to limit the data by a trade-off with the quality of the answer. In this paper we present a mathematically derived model to predict the quality implications of neglecting information before query execution. In particular we investigate the possibility to predict the retrieval quality for a document collection for which no training information is available, which is usually the case in practice. Instead, we construct a model that can be trained on other document collections for which the necessary quality information is available, or can be obtained quite easily. We validate our model for several document collections and present the experimental results. These results show that our model performs quite well, even for the case were we did not train it on the test collection itself.
Original languageEnglish
Title of host publicationCIKM '01
Subtitle of host publicationProceedings of the Tenth International Conference on Information and Knowledge Management
Place of PublicationNew York, NY, USA
PublisherACM Press
Pages207-214
Number of pages8
ISBN (Print)1-58113-436-3
DOIs
Publication statusPublished - Nov 2001
Event10th International Conference on Information and Knowledge Management, CIKM 2001 - Atlanta, United States
Duration: 5 Nov 200110 Nov 2001
Conference number: 10

Conference

Conference10th International Conference on Information and Knowledge Management, CIKM 2001
Abbreviated titleCIKM
Country/TerritoryUnited States
CityAtlanta
Period5/11/0110/11/01

Keywords

  • DB-IR: INFORMATION RETRIEVAL
  • Quality
  • Efficiency
  • Trade-offs
  • Fragmentation
  • Zipf
  • Information retrieval
  • Databases

Fingerprint

Dive into the research topics of 'Predicting the cost-quality trade-off for information retrieval queries: Facilitating database design and query optimization'. Together they form a unique fingerprint.

Cite this