Abstract
Large document collections can be partitioned into topical shards to facilitate distributed search. In a low-resource search environment only a few of the shards can be searched in parallel. Such a search environment faces two intertwined challenges. First, determining which shards to consult for a given query: shard ranking. Second, how many shards to consult from the ranking: cutoff estimation. In this paper we present a family of three algorithms that address both of these problems. As a basis we employ a commonly used data structure, the central sample index (CSI), to represent the shard contents. Running a query against the CSI yields a flat document ranking that each of our algorithms transforms into a tree structure. A bottom up traversal of the tree is used to infer a ranking of shards and also to estimate a stopping point in this ranking that yields cost-effective selective distributed search. As compared to a state-of-the-art shard ranking approach the proposed algorithms provide substantially higher search efficiency while providing comparable search effectiveness.
Original language | Undefined |
---|---|
Title of host publication | Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012 |
Place of Publication | New York |
Publisher | Association for Computing Machinery |
Pages | 555-564 |
Number of pages | 10 |
ISBN (Print) | 978-1-4503-1156-4 |
DOIs | |
Publication status | Published - 29 Oct 2012 |
Event | 21th ACM international Conference on Information and Knowledge Management, CIKM 2012 - Maui, United States Duration: 29 Oct 2012 → 2 Nov 2012 Conference number: 21 |
Publication series
Name | |
---|---|
Publisher | ACM |
Conference
Conference | 21th ACM international Conference on Information and Knowledge Management, CIKM 2012 |
---|---|
Abbreviated title | CIKM |
Country/Territory | United States |
City | Maui |
Period | 29/10/12 → 2/11/12 |
Keywords
- METIS-296085
- IR-81533
- EWI-22218
- selective search
- CR-H.3
- Distributed Information Retrieval