Abstract
Search engines can improve their efficiency by selecting only few promising shards for each query. State-of-the-art shard selection algorithms first query a central index of sampled documents, and their effectiveness is similar to searching all shards. However, the search in the central index also hurts efficiency. Additionally, we show that the effectiveness of these approaches varies substantially with the sampled documents. This paper proposes Taily, a novel shard selection algorithm that models a query's score distribution in each shard as a Gamma distribution and selects shards with highly scored documents in the tail of the distribution. Taily estimates the parameters of score distributions based on the mean and variance of the score function's features in the collections and shards. Because Taily operates on term statistics instead of document samples, it is efficient and has deterministic effectiveness. Experiments on large web collections (Gov2, CluewebA and CluewebB) show that Taily achieves similar effectiveness to sample-based approaches, and improves upon their efficiency by roughly 20% in terms of used resources and response time.
Original language | Undefined |
---|---|
Title of host publication | Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013 |
Place of Publication | New York |
Publisher | Association for Computing Machinery |
Pages | 673-682 |
Number of pages | 10 |
ISBN (Print) | 978-1-4503-2034-4 |
DOIs | |
Publication status | Published - Jul 2013 |
Event | 36th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013 - Dublin, Ireland Duration: 29 Jul 2013 → 1 Aug 2013 Conference number: 36 http://www.sigir.org/sigir2013/ |
Publication series
Name | |
---|---|
Publisher | ACM |
Conference
Conference | 36th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013 |
---|---|
Abbreviated title | SIGIR |
Country/Territory | Ireland |
City | Dublin |
Period | 29/07/13 → 1/08/13 |
Internet address |
Keywords
- distributed retrieval
- EWI-23570
- METIS-297771
- IR-87300
- Database selection