The centralized web search paradigm introduces several problems, such as large data traffic requirements for crawling, index freshness problems and problems to index everything. In this study, we look at collection selection using highly discriminative keys and query-driven indexing as part of a distributed web search system. The approach is evaluated on different splits of the TREC WT10g corpus. Experimental results show that the approach outperforms a Dirichlet smoothing language modeling approach for collection selection, if we assume that web servers index their local content.
|Name||CEUR Workshop Series|
|Workshop||7th Workshop on Large-Scale Distributed Systems for Information Retrieval|
|Period||23/07/09 → 23/07/09|
|Other||23 Jul 2009|
- Distributed Information Retrieval