Twenty-One at TREC-7: ad-hoc and cross-language track

Djoerd Hiemstra, Wessel Kraaij

Abstract

This paper describes the official runs of the Twenty-One group for TREC-7. The Twenty-One group participated in the ad-hoc and the cross-language track and made the following accomplishments: We developed a new weighting algorithm, which outperforms the popular Cornell version of BM25 on the ad-hoc collection. For the CLIR task we developed a fuzzy matching algorithm to recover from missing translations and spelling variants of proper names. Also for CLIR we investigated translation strategies that make extensive use of information from our dictionaries by identifying preferred translations, main translations and synonym translations, by defining weights of possible translations and by experimenting with probabilistic boolean matching strategies.
Original languageUndefined
Title of host publicationProceedings of the seventh Text Retrieval Conference (TREC)
EditorsE.M Voorhees, D.K. Harman
Place of PublicationGaithersburg, USA
PublisherUS National Institute of Standards and Technology
Pages227-238
Number of pages12
StatePublished - 1999
EventSeventh Text REtrieval Conference, TREC-7 1998 - Gaithersburg, United States

Publication series

NameNIST Special Publications
PublisherUS National Institute of Standards and Technology
Volume500-242

Conference

ConferenceSeventh Text REtrieval Conference, TREC-7 1998
Abbreviated titleTREC
CountryUnited States
CityGaithersburg
Period9/11/9911/11/99

Fingerprint

Glossaries

Keywords

  • CR-H.3.3
  • IR-66980
  • EWI-9421
  • METIS-119693

Cite this

Hiemstra, D., & Kraaij, W. (1999). Twenty-One at TREC-7: ad-hoc and cross-language track. In E. M. Voorhees, & D. K. Harman (Eds.), Proceedings of the seventh Text Retrieval Conference (TREC) (pp. 227-238). (NIST Special Publications; Vol. 500-242). Gaithersburg, USA: US National Institute of Standards and Technology.

Hiemstra, Djoerd; Kraaij, Wessel / Twenty-One at TREC-7: ad-hoc and cross-language track.

Proceedings of the seventh Text Retrieval Conference (TREC). ed. / E.M Voorhees; D.K. Harman. Gaithersburg, USA : US National Institute of Standards and Technology, 1999. p. 227-238 (NIST Special Publications; Vol. 500-242).

Research output: ScientificConference contribution

@inbook{1dd7fcbf112548a4b1a2697d9f008f3e,
title = "Twenty-One at TREC-7: ad-hoc and cross-language track",
abstract = "This paper describes the official runs of the Twenty-One group for TREC-7. The Twenty-One group participated in the ad-hoc and the cross-language track and made the following accomplishments: We developed a new weighting algorithm, which outperforms the popular Cornell version of BM25 on the ad-hoc collection. For the CLIR task we developed a fuzzy matching algorithm to recover from missing translations and spelling variants of proper names. Also for CLIR we investigated translation strategies that make extensive use of information from our dictionaries by identifying preferred translations, main translations and synonym translations, by defining weights of possible translations and by experimenting with probabilistic boolean matching strategies.",
keywords = "CR-H.3.3, IR-66980, EWI-9421, METIS-119693",
author = "Djoerd Hiemstra and Wessel Kraaij",
year = "1999",
series = "NIST Special Publications",
publisher = "US National Institute of Standards and Technology",
pages = "227--238",
editor = "E.M Voorhees and D.K. Harman",
booktitle = "Proceedings of the seventh Text Retrieval Conference (TREC)",

}

Hiemstra, D & Kraaij, W 1999, Twenty-One at TREC-7: ad-hoc and cross-language track. in EM Voorhees & DK Harman (eds), Proceedings of the seventh Text Retrieval Conference (TREC). NIST Special Publications, vol. 500-242, US National Institute of Standards and Technology, Gaithersburg, USA, pp. 227-238, Seventh Text REtrieval Conference, TREC-7 1998, Gaithersburg, United States, 9-11 November.

Twenty-One at TREC-7: ad-hoc and cross-language track. / Hiemstra, Djoerd; Kraaij, Wessel.

Proceedings of the seventh Text Retrieval Conference (TREC). ed. / E.M Voorhees; D.K. Harman. Gaithersburg, USA : US National Institute of Standards and Technology, 1999. p. 227-238 (NIST Special Publications; Vol. 500-242).

Research output: ScientificConference contribution

TY - CHAP

T1 - Twenty-One at TREC-7: ad-hoc and cross-language track

AU - Hiemstra,Djoerd

AU - Kraaij,Wessel

PY - 1999

Y1 - 1999

N2 - This paper describes the official runs of the Twenty-One group for TREC-7. The Twenty-One group participated in the ad-hoc and the cross-language track and made the following accomplishments: We developed a new weighting algorithm, which outperforms the popular Cornell version of BM25 on the ad-hoc collection. For the CLIR task we developed a fuzzy matching algorithm to recover from missing translations and spelling variants of proper names. Also for CLIR we investigated translation strategies that make extensive use of information from our dictionaries by identifying preferred translations, main translations and synonym translations, by defining weights of possible translations and by experimenting with probabilistic boolean matching strategies.

AB - This paper describes the official runs of the Twenty-One group for TREC-7. The Twenty-One group participated in the ad-hoc and the cross-language track and made the following accomplishments: We developed a new weighting algorithm, which outperforms the popular Cornell version of BM25 on the ad-hoc collection. For the CLIR task we developed a fuzzy matching algorithm to recover from missing translations and spelling variants of proper names. Also for CLIR we investigated translation strategies that make extensive use of information from our dictionaries by identifying preferred translations, main translations and synonym translations, by defining weights of possible translations and by experimenting with probabilistic boolean matching strategies.

KW - CR-H.3.3

KW - IR-66980

KW - EWI-9421

KW - METIS-119693

M3 - Conference contribution

T3 - NIST Special Publications

SP - 227

EP - 238

BT - Proceedings of the seventh Text Retrieval Conference (TREC)

PB - US National Institute of Standards and Technology

ER -

Hiemstra D, Kraaij W. Twenty-One at TREC-7: ad-hoc and cross-language track. In Voorhees EM, Harman DK, editors, Proceedings of the seventh Text Retrieval Conference (TREC). Gaithersburg, USA: US National Institute of Standards and Technology. 1999. p. 227-238. (NIST Special Publications).