Abstract
This paper describes the participation of the University of Twente in the Web track of TREC 2012. Our baseline approach uses the Mirex toolkit, an open source tool that sequantially scans all the documents. For result diversification, we experimented with improving the quality of clusters through ensemble clustering. We combined clusters obtained by different clustering methods (such as LDA and K-means) and clusters obtained by using different types of data (such as document text and anchor text). Our two-layer ensemble run performed better than the LDA based diversification and also better than a non-diversification run.
Original language | Undefined |
---|---|
Title of host publication | Proceedings of the Twenty First Text REtrieval Conference (TREC 2012) |
Place of Publication | Gaithersburg, MD, USA |
Publisher | NIST |
Pages | 83 |
Number of pages | 4 |
ISBN (Print) | not assigned |
Publication status | Published - 6 Nov 2012 |
Event | Twenty-First Text REtrieval Conference, TREC-21 2012 - Gaithersburg, United States Duration: 6 Nov 2012 → 9 Nov 2012 Conference number: 21 |
Publication series
Name | NIST Special Publications |
---|---|
Publisher | NIST |
Number | SP 500-298 |
Volume | SP 500-298 |
Workshop
Workshop | Twenty-First Text REtrieval Conference, TREC-21 2012 |
---|---|
Abbreviated title | TREC |
Country/Territory | United States |
City | Gaithersburg |
Period | 6/11/12 → 9/11/12 |
Keywords
- EWI-22911
- IR-84255
- METIS-296227