Abstract
An important class of searches on the world-wide-web has the goal to find an entry page (homepage) of an organisation. Entry page search is quite different from Ad Hoc search. Indeed a plain Ad Hoc system performs disappointingly. We explored three non-content features of web pages: page length, number of incoming links and URL form. Especially the URL form proved to be a good predictor. Using URL form priors we found over 70% of all entry pages at rank 1, and up to 89% in the top 10. Non-content features can easily be embedded in a language model framework as a prior probability.
Original language | Undefined |
---|---|
Pages | 27-34 |
Number of pages | 8 |
DOIs | |
Publication status | Published - Aug 2002 |
Event | 25th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002 - Tampere, Finland Duration: 11 Aug 2002 → 15 Aug 2002 Conference number: 25 http://sigir.org/sigir2002/ |
Conference
Conference | 25th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002 |
---|---|
Abbreviated title | SIGIR |
Country/Territory | Finland |
City | Tampere |
Period | 11/08/02 → 15/08/02 |
Internet address |
Keywords
- DB-IR: INFORMATION RETRIEVAL
- EWI-7274
- IR-63507