Abstract
An important class of searches on the world-wide-web has the goal to find an entry page (homepage) of an organisation. Entry page search is quite different from Ad Hoc search. Indeed a plain Ad Hoc system performs disappointingly. We explored three non-content features of web pages: page length, number of incoming links and URL form. Especially the URL form proved to be a good predictor. Using URL form priors we found over 70% of all entry pages at rank 1, and up to 89% in the top 10. Non-content features can easily be embedded in a language model framework as a prior probability.
| Original language | Undefined |
|---|---|
| Pages | 27-34 |
| Number of pages | 8 |
| DOIs | |
| Publication status | Published - Aug 2002 |
| Event | 25th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002 - Tampere, Finland Duration: 11 Aug 2002 → 15 Aug 2002 Conference number: 25 http://sigir.org/sigir2002/ |
Conference
| Conference | 25th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002 |
|---|---|
| Abbreviated title | SIGIR |
| Country/Territory | Finland |
| City | Tampere |
| Period | 11/08/02 → 15/08/02 |
| Internet address |
Keywords
- DB-IR: INFORMATION RETRIEVAL
- EWI-7274
- IR-63507