Retrieving Web Pages using Content, Links, URLs and Anchors

T.H.W. Westerveld, Wessel Kraaij, Djoerd Hiemstra

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

48 Downloads (Pure)

Abstract

For this year’s web track, we concentrated on the entry page finding task. For the content-only runs, in both the ad-hoc task and the entry page finding task, we used an information retrieval system based on a simple unigram language model. In the Ad hoc task we experimented with alternatieve approaches to smoothing. For the entry page task, we incorporated additional information into the model. The sources of information we used in addition to the document’s content are links, URLs and anchors. We found that almost every approach can improve the results of a content only run. In the end, a very basic approach, using the depth of the path of the URL as a prior, yielded by far the largest improvement over the content only results.
Original languageEnglish
Title of host publicationProceedings of the Tenth Text REtrieval Conference (TREC 2001)
EditorsE.M Voorhees, D.K. Harman
Place of PublicationGaithersburg, Maryland, USA
PublisherNational Institute for Science and Technology (NIST)
Pages663-672
Number of pages10
Publication statusPublished - 2002
EventTenth Text REtrieval Conference, TREC-10 2001 - Gaithersburg, United States
Duration: 13 Nov 200116 Nov 2001
Conference number: 10

Publication series

NameNIST Special Publication
PublisherNational Institute for Science and Technology (NIST)
Number500-250
VolumeSP 500

Conference

ConferenceTenth Text REtrieval Conference, TREC-10 2001
Abbreviated titleTREC
CountryUnited States
CityGaithersburg
Period13/11/0116/11/01

Keywords

  • DB-IR: INFORMATION RETRIEVAL
  • EWI-7365
  • METIS-204321
  • IR-66475

Fingerprint Dive into the research topics of 'Retrieving Web Pages using Content, Links, URLs and Anchors'. Together they form a unique fingerprint.

  • Cite this

    Westerveld, T. H. W., Kraaij, W., & Hiemstra, D. (2002). Retrieving Web Pages using Content, Links, URLs and Anchors. In E. M. Voorhees, & D. K. Harman (Eds.), Proceedings of the Tenth Text REtrieval Conference (TREC 2001) (pp. 663-672). (NIST Special Publication; Vol. SP 500, No. 500-250). Gaithersburg, Maryland, USA: National Institute for Science and Technology (NIST).