Abstract
Original language  English 

Awarding Institution 

Supervisors/Advisors 

Award date  19 Jan 2001 
Place of Publication  Enschede 
Publisher  
Print ISBNs  9075296053 
Publication status  Published  19 Jan 2001 
Fingerprint
Keywords
 IR36473
 METIS202069
 EWI6563
Cite this
}
Using Language Models for Information Retrieval. / Hiemstra, Djoerd.
Enschede : Taaluitgeverij Neslia Paniculata, 2001. 164 p.Research output: Thesis › PhD Thesis  Research UT, graduation UT › Academic
TY  THES
T1  Using Language Models for Information Retrieval
AU  Hiemstra, Djoerd
N1  Imported from HMI
PY  2001/1/19
Y1  2001/1/19
N2  Because of the world wide web, information retrieval systems are now used by millions of untrained users all over the world. The search engines that perform the information retrieval tasks, often retrieve thousands of potentially interesting documents to a query. The documents should be ranked in decreasing order of relevance in order to be useful to the user. This book describes a mathematical model of information retrieval based on the use of statistical language models. The approach uses simple documentbased unigram models to compute for each document the probability that it generates the query. This probability is used to rank the documents. The study makes the following research contributions. * The development of a model that integrates term weighting, relevance feedback and structured queries. * The development of a model that supports multiple representations of a request or information need by integrating a statistical translation model. * The development of a model that supports multiple representations of a document, for instance by allowing proximity searches or searches for terms from a particular record field (e.g. a search for terms from the title). * A mathematical interpretation of stop word removal and stemming. * A mathematical interpretation of operators for mandatory terms, wildcards and synonyms. * A practical comparison of a language modelbased retrieval system with similar systems that are based on wellestablished models and term weighting algorithms in a controlled experiment. * The application of the model to crosslanguage information retrieval and adaptive information filtering, and the evaluation of two prototype systems in a controlled experiment. Experimental results on three standard tasks show that the language modelbased algorithms work as well as, or better than, today's topperforming retrieval algorithms. The standard tasks investigated are adhoc retrieval (when there are no previously retrieved documents to guide the search), retrospective relevance weighting (find the optimum model for a given set of relevant documents), and adhoc retrieval using manually formulated Boolean queries. The application to crosslanguage retrieval and adaptive filtering shows the practical use of respectively structured queries, and relevance feedback.
AB  Because of the world wide web, information retrieval systems are now used by millions of untrained users all over the world. The search engines that perform the information retrieval tasks, often retrieve thousands of potentially interesting documents to a query. The documents should be ranked in decreasing order of relevance in order to be useful to the user. This book describes a mathematical model of information retrieval based on the use of statistical language models. The approach uses simple documentbased unigram models to compute for each document the probability that it generates the query. This probability is used to rank the documents. The study makes the following research contributions. * The development of a model that integrates term weighting, relevance feedback and structured queries. * The development of a model that supports multiple representations of a request or information need by integrating a statistical translation model. * The development of a model that supports multiple representations of a document, for instance by allowing proximity searches or searches for terms from a particular record field (e.g. a search for terms from the title). * A mathematical interpretation of stop word removal and stemming. * A mathematical interpretation of operators for mandatory terms, wildcards and synonyms. * A practical comparison of a language modelbased retrieval system with similar systems that are based on wellestablished models and term weighting algorithms in a controlled experiment. * The application of the model to crosslanguage information retrieval and adaptive information filtering, and the evaluation of two prototype systems in a controlled experiment. Experimental results on three standard tasks show that the language modelbased algorithms work as well as, or better than, today's topperforming retrieval algorithms. The standard tasks investigated are adhoc retrieval (when there are no previously retrieved documents to guide the search), retrospective relevance weighting (find the optimum model for a given set of relevant documents), and adhoc retrieval using manually formulated Boolean queries. The application to crosslanguage retrieval and adaptive filtering shows the practical use of respectively structured queries, and relevance feedback.
KW  IR36473
KW  METIS202069
KW  EWI6563
M3  PhD Thesis  Research UT, graduation UT
SN  9075296053
T3  CTIT Ph.D. thesis series
PB  Taaluitgeverij Neslia Paniculata
CY  Enschede
ER 