The Importance of Prior Probabilities for Entry Page Search

W. Kraaij, T.H.W. Westerveld, Djoerd Hiemstra

Research output: Contribution to conferencePaperpeer-review

13 Downloads (Pure)

Abstract

An important class of searches on the world-wide-web has the goal to find an entry page (homepage) of an organisation. Entry page search is quite different from Ad Hoc search. Indeed a plain Ad Hoc system performs disappointingly. We explored three non-content features of web pages: page length, number of incoming links and URL form. Especially the URL form proved to be a good predictor. Using URL form priors we found over 70% of all entry pages at rank 1, and up to 89% in the top 10. Non-content features can easily be embedded in a language model framework as a prior probability.
Original languageUndefined
Pages27-34
Number of pages8
DOIs
Publication statusPublished - Aug 2002
Event25th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002 - Tampere, Finland
Duration: 11 Aug 200215 Aug 2002
Conference number: 25
http://sigir.org/sigir2002/

Conference

Conference25th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002
Abbreviated titleSIGIR
Country/TerritoryFinland
CityTampere
Period11/08/0215/08/02
Internet address

Keywords

  • DB-IR: INFORMATION RETRIEVAL
  • EWI-7274
  • IR-63507

Cite this