Predicting the cost-quality trade-off for information retrieval queries: Facilitating database design and query optimization

Henk Ernst Blok, Djoerd Hiemstra, Sunil Choenni, Franciska de Jong, Henk M. Blanken, Peter M.G. Apers

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

7 Citations (Scopus)

Abstract

Efficient, flexible, and scalable integration of full text information retrieval (IR) in a DBMS is not a trivial case. This holds in particular for query optimization in such a context. To facilitate the bulk-oriented behavior of database query processing, a priori knowledge of how to limit the data efficiently prior to query evaluation is very valuable at optimization time. The usually imprecise nature of IR querying provides an extra opportunity to limit the data by a trade-off with the quality of the answer. In this paper we present a mathematically derived model to predict the quality implications of neglecting information before query execution. In particular we investigate the possibility to predict the retrieval quality for a document collection for which no training information is available, which is usually the case in practice. Instead, we construct a model that can be trained on other document collections for which the necessary quality information is available, or can be obtained quite easily. We validate our model for several document collections and present the experimental results. These results show that our model performs quite well, even for the case were we did not train it on the test collection itself.
Original languageEnglish
Title of host publicationCIKM '01
Subtitle of host publicationProceedings of the Tenth International Conference on Information and Knowledge Management
Place of PublicationNew York, NY, USA
PublisherACM Press
Pages207-214
Number of pages8
ISBN (Print)1-58113-436-3
DOIs
Publication statusPublished - Nov 2001
Event10th International Conference on Information and Knowledge Management, CIKM 2001 - Atlanta, United States
Duration: 5 Nov 200110 Nov 2001
Conference number: 10

Conference

Conference10th International Conference on Information and Knowledge Management, CIKM 2001
Abbreviated titleCIKM
CountryUnited States
CityAtlanta
Period5/11/0110/11/01

Fingerprint

Information retrieval
Costs
Query processing

Keywords

  • DB-IR: INFORMATION RETRIEVAL
  • Quality
  • Efficiency
  • Trade-offs
  • Fragmentation
  • Zipf
  • Information retrieval
  • Databases

Cite this

Blok, H. E., Hiemstra, D., Choenni, S., de Jong, F., Blanken, H. M., & Apers, P. M. G. (2001). Predicting the cost-quality trade-off for information retrieval queries: Facilitating database design and query optimization. In CIKM '01: Proceedings of the Tenth International Conference on Information and Knowledge Management (pp. 207-214). New York, NY, USA: ACM Press. https://doi.org/10.1145/502585.502621
Blok, Henk Ernst ; Hiemstra, Djoerd ; Choenni, Sunil ; de Jong, Franciska ; Blanken, Henk M. ; Apers, Peter M.G. / Predicting the cost-quality trade-off for information retrieval queries : Facilitating database design and query optimization. CIKM '01: Proceedings of the Tenth International Conference on Information and Knowledge Management. New York, NY, USA : ACM Press, 2001. pp. 207-214
@inproceedings{89ae5e691db04b74aed4799cc96c66f3,
title = "Predicting the cost-quality trade-off for information retrieval queries: Facilitating database design and query optimization",
abstract = "Efficient, flexible, and scalable integration of full text information retrieval (IR) in a DBMS is not a trivial case. This holds in particular for query optimization in such a context. To facilitate the bulk-oriented behavior of database query processing, a priori knowledge of how to limit the data efficiently prior to query evaluation is very valuable at optimization time. The usually imprecise nature of IR querying provides an extra opportunity to limit the data by a trade-off with the quality of the answer. In this paper we present a mathematically derived model to predict the quality implications of neglecting information before query execution. In particular we investigate the possibility to predict the retrieval quality for a document collection for which no training information is available, which is usually the case in practice. Instead, we construct a model that can be trained on other document collections for which the necessary quality information is available, or can be obtained quite easily. We validate our model for several document collections and present the experimental results. These results show that our model performs quite well, even for the case were we did not train it on the test collection itself.",
keywords = "DB-IR: INFORMATION RETRIEVAL, Quality, Efficiency, Trade-offs, Fragmentation, Zipf, Information retrieval, Databases",
author = "Blok, {Henk Ernst} and Djoerd Hiemstra and Sunil Choenni and {de Jong}, Franciska and Blanken, {Henk M.} and Apers, {Peter M.G.}",
year = "2001",
month = "11",
doi = "10.1145/502585.502621",
language = "English",
isbn = "1-58113-436-3",
pages = "207--214",
booktitle = "CIKM '01",
publisher = "ACM Press",

}

Blok, HE, Hiemstra, D, Choenni, S, de Jong, F, Blanken, HM & Apers, PMG 2001, Predicting the cost-quality trade-off for information retrieval queries: Facilitating database design and query optimization. in CIKM '01: Proceedings of the Tenth International Conference on Information and Knowledge Management. ACM Press, New York, NY, USA, pp. 207-214, 10th International Conference on Information and Knowledge Management, CIKM 2001, Atlanta, United States, 5/11/01. https://doi.org/10.1145/502585.502621

Predicting the cost-quality trade-off for information retrieval queries : Facilitating database design and query optimization. / Blok, Henk Ernst; Hiemstra, Djoerd; Choenni, Sunil; de Jong, Franciska; Blanken, Henk M.; Apers, Peter M.G.

CIKM '01: Proceedings of the Tenth International Conference on Information and Knowledge Management. New York, NY, USA : ACM Press, 2001. p. 207-214.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Predicting the cost-quality trade-off for information retrieval queries

T2 - Facilitating database design and query optimization

AU - Blok, Henk Ernst

AU - Hiemstra, Djoerd

AU - Choenni, Sunil

AU - de Jong, Franciska

AU - Blanken, Henk M.

AU - Apers, Peter M.G.

PY - 2001/11

Y1 - 2001/11

N2 - Efficient, flexible, and scalable integration of full text information retrieval (IR) in a DBMS is not a trivial case. This holds in particular for query optimization in such a context. To facilitate the bulk-oriented behavior of database query processing, a priori knowledge of how to limit the data efficiently prior to query evaluation is very valuable at optimization time. The usually imprecise nature of IR querying provides an extra opportunity to limit the data by a trade-off with the quality of the answer. In this paper we present a mathematically derived model to predict the quality implications of neglecting information before query execution. In particular we investigate the possibility to predict the retrieval quality for a document collection for which no training information is available, which is usually the case in practice. Instead, we construct a model that can be trained on other document collections for which the necessary quality information is available, or can be obtained quite easily. We validate our model for several document collections and present the experimental results. These results show that our model performs quite well, even for the case were we did not train it on the test collection itself.

AB - Efficient, flexible, and scalable integration of full text information retrieval (IR) in a DBMS is not a trivial case. This holds in particular for query optimization in such a context. To facilitate the bulk-oriented behavior of database query processing, a priori knowledge of how to limit the data efficiently prior to query evaluation is very valuable at optimization time. The usually imprecise nature of IR querying provides an extra opportunity to limit the data by a trade-off with the quality of the answer. In this paper we present a mathematically derived model to predict the quality implications of neglecting information before query execution. In particular we investigate the possibility to predict the retrieval quality for a document collection for which no training information is available, which is usually the case in practice. Instead, we construct a model that can be trained on other document collections for which the necessary quality information is available, or can be obtained quite easily. We validate our model for several document collections and present the experimental results. These results show that our model performs quite well, even for the case were we did not train it on the test collection itself.

KW - DB-IR: INFORMATION RETRIEVAL

KW - Quality

KW - Efficiency

KW - Trade-offs

KW - Fragmentation

KW - Zipf

KW - Information retrieval

KW - Databases

U2 - 10.1145/502585.502621

DO - 10.1145/502585.502621

M3 - Conference contribution

SN - 1-58113-436-3

SP - 207

EP - 214

BT - CIKM '01

PB - ACM Press

CY - New York, NY, USA

ER -

Blok HE, Hiemstra D, Choenni S, de Jong F, Blanken HM, Apers PMG. Predicting the cost-quality trade-off for information retrieval queries: Facilitating database design and query optimization. In CIKM '01: Proceedings of the Tenth International Conference on Information and Knowledge Management. New York, NY, USA: ACM Press. 2001. p. 207-214 https://doi.org/10.1145/502585.502621