Query-Based Sampling using Snippets

A.S. Tigelaar, Djoerd Hiemstra

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)
34 Downloads (Pure)

Abstract

Query-based sampling is a commonly used approach to model the content of servers. Conventionally, queries are sent to a server and the documents in the search results returned are downloaded in full as representation of the server’s content. We present an approach that uses the document snippets in the search results as samples instead of downloading the entire documents. We show this yields equal or better modeling performance for the same bandwidth consumption depending on collection characteristics, like document length distribution and homogeneity. Query-based sampling using snippets is a useful approach for real-world systems, since it requires no extra operations beyond exchanging queries and search results.
Original languageUndefined
Title of host publicationEighth Workshop on Large-Scale Distributed Systems for Information Retrieval
Place of PublicationAachen, Germany
PublisherAssociation for Computing Machinery (ACM)
Pages9-14
Number of pages6
ISBN (Print)not assigned
Publication statusPublished - 23 Jul 2010
EventEighth Workshop on Large-Scale Distributed Systems for Information Retrieval - Geneva, Switzerland
Duration: 23 Jul 201023 Jul 2010

Publication series

NameCEUR Workshop Proceedings
PublisherCEUR-WS
Volume630
ISSN (Print)1613-0073

Workshop

WorkshopEighth Workshop on Large-Scale Distributed Systems for Information Retrieval
Period23/07/1023/07/10
Other23 Jul 2010

Keywords

  • METIS-270920
  • IR-72429
  • Distributed Information Retrieval
  • CR-H.3.4
  • query-based sampling
  • CR-H.3.3
  • EWI-18164

Cite this