Shard Ranking and Cutoff Estimation for Topically Partitioned Collections

Anagha Kulkarni, A.S. Tigelaar, Djoerd Hiemstra, Jamie Callan

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

22 Citations (Scopus)
19 Downloads (Pure)

Abstract

Large document collections can be partitioned into topical shards to facilitate distributed search. In a low-resource search environment only a few of the shards can be searched in parallel. Such a search environment faces two intertwined challenges. First, determining which shards to consult for a given query: shard ranking. Second, how many shards to consult from the ranking: cutoff estimation. In this paper we present a family of three algorithms that address both of these problems. As a basis we employ a commonly used data structure, the central sample index (CSI), to represent the shard contents. Running a query against the CSI yields a flat document ranking that each of our algorithms transforms into a tree structure. A bottom up traversal of the tree is used to infer a ranking of shards and also to estimate a stopping point in this ranking that yields cost-effective selective distributed search. As compared to a state-of-the-art shard ranking approach the proposed algorithms provide substantially higher search efficiency while providing comparable search effectiveness.
Original languageUndefined
Title of host publicationProceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages555-564
Number of pages10
ISBN (Print)978-1-4503-1156-4
DOIs
Publication statusPublished - 29 Oct 2012
Event21th ACM international Conference on Information and Knowledge Management, CIKM 2012 - Maui, United States
Duration: 29 Oct 20122 Nov 2012
Conference number: 21

Publication series

Name
PublisherACM

Conference

Conference21th ACM international Conference on Information and Knowledge Management, CIKM 2012
Abbreviated titleCIKM
CountryUnited States
CityMaui
Period29/10/122/11/12

Keywords

  • METIS-296085
  • IR-81533
  • EWI-22218
  • selective search
  • CR-H.3
  • Distributed Information Retrieval

Cite this

Kulkarni, A., Tigelaar, A. S., Hiemstra, D., & Callan, J. (2012). Shard Ranking and Cutoff Estimation for Topically Partitioned Collections. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012 (pp. 555-564). New York: Association for Computing Machinery (ACM). https://doi.org/10.1145/2396761.2396833