A Case for Automatic System Evaluation

C. Hauff, Djoerd Hiemstra, Leif Azzopardi, Franciska M.G. de Jong

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

19 Citations (Scopus)
13 Downloads (Pure)

Abstract

Ranking a set retrieval systems according to their retrieval effectiveness without relying on relevance judgments was first explored by Soboroff et al. [13]. Over the years, a number of alternative approaches have been proposed, all of which have been evaluated on early TREC test collections. In this work, we perform a wider analysis of system ranking estimation methods on sixteen TREC data sets which cover more tasks and corpora than previously. Our analysis reveals that the performance of system ranking estimation approaches varies across topics. This observation motivates the hypothesis that the performance of such methods can be improved by selecting the “right��? subset of topics from a topic set. We show that using topic subsets improves the performance of automatic system ranking methods by 26% on average, with a maximum of 60%. We also observe that the commonly experienced problem of underestimating the performance of the best systems is data set dependent and not inherent to system ranking estimation. These findings support the case for automatic system evaluation and motivate further research.
Original languageUndefined
Title of host publicationAdvances in Information Retrieval: Proceedings of the 32nd European Conference on IR Research
Place of PublicationLondon
PublisherSpringer
Pages153-165
Number of pages13
ISBN (Print)978-3-642-12274-3
DOIs
Publication statusPublished - Apr 2010
Event32nd European Conference on IR Research - Milton Keynes, United Kingdom
Duration: 28 Mar 201031 Mar 2010

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Verlag
Volume5993/2010

Conference

Conference32nd European Conference on IR Research
Period28/03/1031/03/10
Other28-31 March 2010

Keywords

  • IR-70844
  • METIS-270785
  • automatic system evaluation
  • Information Retrieval
  • EWI-17781
  • Query performance prediction

Cite this