Ensemble clustering for result diversification

Dong-Phuong Nguyen, Djoerd Hiemstra

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademic

21 Downloads (Pure)

Abstract

This paper describes the participation of the University of Twente in the Web track of TREC 2012. Our baseline approach uses the Mirex toolkit, an open source tool that sequantially scans all the documents. For result diversification, we experimented with improving the quality of clusters through ensemble clustering. We combined clusters obtained by different clustering methods (such as LDA and K-means) and clusters obtained by using different types of data (such as document text and anchor text). Our two-layer ensemble run performed better than the LDA based diversification and also better than a non-diversification run.
Original languageUndefined
Title of host publicationProceedings of the Twenty First Text REtrieval Conference (TREC 2012)
Place of PublicationGaithersburg, MD, USA
PublisherNIST
Pages83
Number of pages4
ISBN (Print)not assigned
Publication statusPublished - 6 Nov 2012
EventTwenty-First Text REtrieval Conference, TREC-21 2012 - Gaithersburg, United States
Duration: 6 Nov 20129 Nov 2012
Conference number: 21

Publication series

NameNIST Special Publications
PublisherNIST
NumberSP 500-298
VolumeSP 500-298

Workshop

WorkshopTwenty-First Text REtrieval Conference, TREC-21 2012
Abbreviated titleTREC
CountryUnited States
CityGaithersburg
Period6/11/129/11/12

Keywords

  • EWI-22911
  • IR-84255
  • METIS-296227

Cite this