The centralized web search paradigm introduces several problems, such as large data traffic requirements for crawling, index freshness problems and problems to index everything. In this study, we look at collection selection using highly discriminative keys and query-driven indexing as part of a distributed web search system. The approach is evaluated on different splits of the TREC WT10g corpus. Experimental results show that the approach outperforms a Dirichlet smoothing language modeling approach for collection selection, if we assume that web servers index their local content.
|Title of host publication||Proceedings of the 7th Workshop on Large-Scale Distributed Systems for Information Retrieval|
|Number of pages||8|
|Publication status||Published - 23 Jul 2009|
|Name||CEUR Workshop Series|
- Distributed Information Retrieval
Bockting, S., & Hiemstra, D. (2009). Collection Selection with Highly Discriminative Keys. In Proceedings of the 7th Workshop on Large-Scale Distributed Systems for Information Retrieval (pp. 9-16). (CEUR Workshop Series; Vol. 480). CEUR.