To nd information on theWorld-WideWeb (WWW), two approaches are generally followed. Browsing the web from a specific starting point, or web-site map, is called search by divergence. The second approach, search by convergence, is followed when using a search engine. Most search engines use a information retrieval strategy, which requires that the user supplies some keywords to find the relevant information. Due to the diversity and unstructuredness of the WWW, both approaches offer only limited query formulation techniques to find the relevant information. When focusing on smaller domains of the Internet, still large collections of documents have to be dealt with, which are presented on a single web-site or Intranet. There the content is more related and structured, which allows us to apply database techniques to the web. The Webspace Method aims at using DB techniques to model and query such document collections. A semantical level of abstraction is obtained, by describing the content of the documents with some high-level concepts, defined in an object-oriented schema. This allows us to bring the power of query formulation as known within a database environment to the web. At the same time, we focus on the integration with Information Retrieval, which allows us to formulate complex content-based queries over a collection of web-based documents, containing various types of multimedia. After an introduction into the Webspace Method, the focus in this article will be on the formulation of complex queries over a collection of related multimedia documents, also called a webspace. For that purpose the Webspace Search Engine is built, which combines search by both divergence and convergence to formulate the query, using a graphical representation of the webspace schema. Under the hood, the Webspace Search Engine uses the Data eXchange Language (DXL) to gather the requested information. We will explain the DXL's framework for data exchange, and discuss how it is integrated into the Webspace Search Engine. Furthermore, we will show by some examples how, with help of the Data eXchange Language (DXL), specific parts of documents can be retrieved and integrated into the result of the query, based on the concepts dened in the webspace schema. This in contrast to the average search engine, which just delivers a document's URL.
|Publisher||University of Twente|
|Number of pages||14|
|Publication status||Published - Dec 2001|
|Name||CTIT Technical Report Series|