Taily: shard selection using the tail of score distributions

Robin Aly, Djoerd Hiemstra, Thomas Demeester

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

43 Citations (Scopus)
149 Downloads (Pure)

Abstract

Search engines can improve their efficiency by selecting only few promising shards for each query. State-of-the-art shard selection algorithms first query a central index of sampled documents, and their effectiveness is similar to searching all shards. However, the search in the central index also hurts efficiency. Additionally, we show that the effectiveness of these approaches varies substantially with the sampled documents. This paper proposes Taily, a novel shard selection algorithm that models a query's score distribution in each shard as a Gamma distribution and selects shards with highly scored documents in the tail of the distribution. Taily estimates the parameters of score distributions based on the mean and variance of the score function's features in the collections and shards. Because Taily operates on term statistics instead of document samples, it is efficient and has deterministic effectiveness. Experiments on large web collections (Gov2, CluewebA and CluewebB) show that Taily achieves similar effectiveness to sample-based approaches, and improves upon their efficiency by roughly 20% in terms of used resources and response time.
Original languageUndefined
Title of host publicationProceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Pages673-682
Number of pages10
ISBN (Print)978-1-4503-2034-4
DOIs
Publication statusPublished - Jul 2013
Event36th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013 - Dublin, Ireland
Duration: 29 Jul 20131 Aug 2013
Conference number: 36
http://www.sigir.org/sigir2013/

Publication series

Name
PublisherACM

Conference

Conference36th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013
Abbreviated titleSIGIR
Country/TerritoryIreland
CityDublin
Period29/07/131/08/13
Internet address

Keywords

  • distributed retrieval
  • EWI-23570
  • METIS-297771
  • IR-87300
  • Database selection

Cite this