Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness

V. Mihajlovic, Djoerd Hiemstra, H.E. Blok, Peter M.G. Apers

Research output: Book/ReportReport

Abstract

In this paper we present a systematic analysis of document retrieval using unstructured and structured queries within the score region algebra (SRA) structured retrieval framework. The behavior of di®erent retrieval models, namely Boolean, tf.idf, GPX, language models, and Okapi, is tested using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retrieval models are implemented along four elementary retrieval aspects: element and term selection, element score computation, score combination, and score propagation. The analysis is performed on a numerous experiments evaluated on TREC and CLEF collections, using manually generated unstructured and structured queries. Unstructured queries range from the short title queries to long title + description + narrative queries. For generating structured queries we exploit the knowledge of the document structure and the content used to semantically describe or classify documents. We show that such structured information can be utilized in retrieval engines to give more precise answers to user queries then when using unstructured queries.
LanguageUndefined
Place of PublicationEnschede
PublisherCentre for Telematics and Information Technology (CTIT)
Number of pages10
StatePublished - 1 Oct 2006

Publication series

NameCTIT Technical Report Series
PublisherCentre for Telematics and Information Technology, University of Twente
No.06-57
ISSN (Print)1381-3625

Keywords

  • DB-XMLIR: XML INFORMATION RETRIEVAL
  • EWI-6918
  • IR-66353
  • METIS-238677

Cite this

Mihajlovic, V., Hiemstra, D., Blok, H. E., & Apers, P. M. G. (2006). Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness. (CTIT Technical Report Series; No. 06-57). Enschede: Centre for Telematics and Information Technology (CTIT).
Mihajlovic, V. ; Hiemstra, Djoerd ; Blok, H.E. ; Apers, Peter M.G./ Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness. Enschede : Centre for Telematics and Information Technology (CTIT), 2006. 10 p. (CTIT Technical Report Series; 06-57).
@book{90492084d9df4685bcdd1e6ebf9b7932,
title = "Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness",
abstract = "In this paper we present a systematic analysis of document retrieval using unstructured and structured queries within the score region algebra (SRA) structured retrieval framework. The behavior of di{\circledR}erent retrieval models, namely Boolean, tf.idf, GPX, language models, and Okapi, is tested using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retrieval models are implemented along four elementary retrieval aspects: element and term selection, element score computation, score combination, and score propagation. The analysis is performed on a numerous experiments evaluated on TREC and CLEF collections, using manually generated unstructured and structured queries. Unstructured queries range from the short title queries to long title + description + narrative queries. For generating structured queries we exploit the knowledge of the document structure and the content used to semantically describe or classify documents. We show that such structured information can be utilized in retrieval engines to give more precise answers to user queries then when using unstructured queries.",
keywords = "DB-XMLIR: XML INFORMATION RETRIEVAL, EWI-6918, IR-66353, METIS-238677",
author = "V. Mihajlovic and Djoerd Hiemstra and H.E. Blok and Apers, {Peter M.G.}",
year = "2006",
month = "10",
day = "1",
language = "Undefined",
series = "CTIT Technical Report Series",
publisher = "Centre for Telematics and Information Technology (CTIT)",
number = "06-57",
address = "Netherlands",

}

Mihajlovic, V, Hiemstra, D, Blok, HE & Apers, PMG 2006, Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness. CTIT Technical Report Series, no. 06-57, Centre for Telematics and Information Technology (CTIT), Enschede.

Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness. / Mihajlovic, V.; Hiemstra, Djoerd; Blok, H.E.; Apers, Peter M.G.

Enschede : Centre for Telematics and Information Technology (CTIT), 2006. 10 p. (CTIT Technical Report Series; No. 06-57).

Research output: Book/ReportReport

TY - BOOK

T1 - Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness

AU - Mihajlovic,V.

AU - Hiemstra,Djoerd

AU - Blok,H.E.

AU - Apers,Peter M.G.

PY - 2006/10/1

Y1 - 2006/10/1

N2 - In this paper we present a systematic analysis of document retrieval using unstructured and structured queries within the score region algebra (SRA) structured retrieval framework. The behavior of di®erent retrieval models, namely Boolean, tf.idf, GPX, language models, and Okapi, is tested using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retrieval models are implemented along four elementary retrieval aspects: element and term selection, element score computation, score combination, and score propagation. The analysis is performed on a numerous experiments evaluated on TREC and CLEF collections, using manually generated unstructured and structured queries. Unstructured queries range from the short title queries to long title + description + narrative queries. For generating structured queries we exploit the knowledge of the document structure and the content used to semantically describe or classify documents. We show that such structured information can be utilized in retrieval engines to give more precise answers to user queries then when using unstructured queries.

AB - In this paper we present a systematic analysis of document retrieval using unstructured and structured queries within the score region algebra (SRA) structured retrieval framework. The behavior of di®erent retrieval models, namely Boolean, tf.idf, GPX, language models, and Okapi, is tested using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retrieval models are implemented along four elementary retrieval aspects: element and term selection, element score computation, score combination, and score propagation. The analysis is performed on a numerous experiments evaluated on TREC and CLEF collections, using manually generated unstructured and structured queries. Unstructured queries range from the short title queries to long title + description + narrative queries. For generating structured queries we exploit the knowledge of the document structure and the content used to semantically describe or classify documents. We show that such structured information can be utilized in retrieval engines to give more precise answers to user queries then when using unstructured queries.

KW - DB-XMLIR: XML INFORMATION RETRIEVAL

KW - EWI-6918

KW - IR-66353

KW - METIS-238677

M3 - Report

T3 - CTIT Technical Report Series

BT - Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness

PB - Centre for Telematics and Information Technology (CTIT)

CY - Enschede

ER -

Mihajlovic V, Hiemstra D, Blok HE, Apers PMG. Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness. Enschede: Centre for Telematics and Information Technology (CTIT), 2006. 10 p. (CTIT Technical Report Series; 06-57).