Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness

V. Mihajlovic, Djoerd Hiemstra, H.E. Blok, Peter M.G. Apers

Research output: Book/ReportReportProfessional

56 Downloads (Pure)

Abstract

In this paper we present a systematic analysis of document retrieval using unstructured and structured queries within the score region algebra (SRA) structured retrieval framework. The behavior of di®erent retrieval models, namely Boolean, tf.idf, GPX, language models, and Okapi, is tested using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retrieval models are implemented along four elementary retrieval aspects: element and term selection, element score computation, score combination, and score propagation. The analysis is performed on a numerous experiments evaluated on TREC and CLEF collections, using manually generated unstructured and structured queries. Unstructured queries range from the short title queries to long title + description + narrative queries. For generating structured queries we exploit the knowledge of the document structure and the content used to semantically describe or classify documents. We show that such structured information can be utilized in retrieval engines to give more precise answers to user queries then when using unstructured queries.
Original languageUndefined
Place of PublicationEnschede
PublisherCentre for Telematics and Information Technology (CTIT)
Number of pages10
Publication statusPublished - 1 Oct 2006

Publication series

NameCTIT Technical Report Series
PublisherCentre for Telematics and Information Technology, University of Twente
No.06-57
ISSN (Print)1381-3625

Keywords

  • DB-XMLIR: XML INFORMATION RETRIEVAL
  • EWI-6918
  • IR-66353
  • METIS-238677

Cite this

Mihajlovic, V., Hiemstra, D., Blok, H. E., & Apers, P. M. G. (2006). Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness. (CTIT Technical Report Series; No. 06-57). Enschede: Centre for Telematics and Information Technology (CTIT).