Distributed Deep Web Search

Kien Tjin-Kam-Jet

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

152 Downloads (Pure)

Abstract

The World Wide Web contains billions of documents (and counting); hence, it is likely that some document will contain the answer or content you are searching for. While major search engines like Bing and Google often manage to return relevant results to your query, there are plenty of situations in which they are less capable of doing so. Specifically, there is a noticeable shortcoming in situations that involve the retrieval of data from the deep web. Deep web data is difficult to crawl and index for today’s web search engines, and this is largely due to the fact that the data must be accessed via complex web forms. However, deep web data can be highly relevant to the information-need of the end-user. This thesis overviews the problems, solutions, and paradigms for deep web search. Moreover, it proposes a new paradigm to overcome the apparent limitations in the current state of deep web search, and makes the following scientific contributions: 1. A more specific classification scheme for deep web search systems, to better illustrate the differences and variation between these systems. 2. Virtual surfacing, a new, and in our opinion better, deep web search paradigm which tries to combine the benefits of the two already existing paradigms, surfacing and virtual integration, and which also raises new research opportunities. 3. A stack decoding approach which combines rules and statistical usage information for interpreting the end-user’s free-text query, and to subsequently derive filled-out web forms based on that interpretation. 4. A practical comparison of the developed approach against a well-established text-processing toolkit. 5. Empirical evidence that, for a single site, end-users would rather use the proposed free-text search interface instead of a complex web form. Analysis of data obtained from user studies shows that the stack decoding approach works as well as, or better than, today’s top-performing alternatives.
Original languageUndefined
Awarding Institution
  • University of Twente
Supervisors/Advisors
  • Apers, Peter Maria Gerardus, Supervisor
  • Trieschnigg, Rudolf Berend, Advisor
  • de Jong, Franciska M.G., Supervisor
  • Hiemstra, Djoerd , Advisor
Thesis sponsors
Award date19 Dec 2013
Place of PublicationEnschede
Publisher
Print ISBNs978-90-365-3564-9
DOIs
Publication statusPublished - 19 Dec 2013

Keywords

  • EWI-24175
  • IR-88253
  • METIS-299638

Cite this

Tjin-Kam-Jet, K. (2013). Distributed Deep Web Search. Enschede: University of Twente. https://doi.org/10.3990/1.9789036535649
Tjin-Kam-Jet, Kien. / Distributed Deep Web Search. Enschede : University of Twente, 2013. 124 p.
@phdthesis{3976d8b3805147078fb476698852e122,
title = "Distributed Deep Web Search",
abstract = "The World Wide Web contains billions of documents (and counting); hence, it is likely that some document will contain the answer or content you are searching for. While major search engines like Bing and Google often manage to return relevant results to your query, there are plenty of situations in which they are less capable of doing so. Specifically, there is a noticeable shortcoming in situations that involve the retrieval of data from the deep web. Deep web data is difficult to crawl and index for today’s web search engines, and this is largely due to the fact that the data must be accessed via complex web forms. However, deep web data can be highly relevant to the information-need of the end-user. This thesis overviews the problems, solutions, and paradigms for deep web search. Moreover, it proposes a new paradigm to overcome the apparent limitations in the current state of deep web search, and makes the following scientific contributions: 1. A more specific classification scheme for deep web search systems, to better illustrate the differences and variation between these systems. 2. Virtual surfacing, a new, and in our opinion better, deep web search paradigm which tries to combine the benefits of the two already existing paradigms, surfacing and virtual integration, and which also raises new research opportunities. 3. A stack decoding approach which combines rules and statistical usage information for interpreting the end-user’s free-text query, and to subsequently derive filled-out web forms based on that interpretation. 4. A practical comparison of the developed approach against a well-established text-processing toolkit. 5. Empirical evidence that, for a single site, end-users would rather use the proposed free-text search interface instead of a complex web form. Analysis of data obtained from user studies shows that the stack decoding approach works as well as, or better than, today’s top-performing alternatives.",
keywords = "EWI-24175, IR-88253, METIS-299638",
author = "Kien Tjin-Kam-Jet",
note = "SIKS dissertation series; no. 2013-34",
year = "2013",
month = "12",
day = "19",
doi = "10.3990/1.9789036535649",
language = "Undefined",
isbn = "978-90-365-3564-9",
publisher = "University of Twente",
address = "Netherlands",
school = "University of Twente",

}

Tjin-Kam-Jet, K 2013, 'Distributed Deep Web Search', University of Twente, Enschede. https://doi.org/10.3990/1.9789036535649

Distributed Deep Web Search. / Tjin-Kam-Jet, Kien.

Enschede : University of Twente, 2013. 124 p.

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

TY - THES

T1 - Distributed Deep Web Search

AU - Tjin-Kam-Jet, Kien

N1 - SIKS dissertation series; no. 2013-34

PY - 2013/12/19

Y1 - 2013/12/19

N2 - The World Wide Web contains billions of documents (and counting); hence, it is likely that some document will contain the answer or content you are searching for. While major search engines like Bing and Google often manage to return relevant results to your query, there are plenty of situations in which they are less capable of doing so. Specifically, there is a noticeable shortcoming in situations that involve the retrieval of data from the deep web. Deep web data is difficult to crawl and index for today’s web search engines, and this is largely due to the fact that the data must be accessed via complex web forms. However, deep web data can be highly relevant to the information-need of the end-user. This thesis overviews the problems, solutions, and paradigms for deep web search. Moreover, it proposes a new paradigm to overcome the apparent limitations in the current state of deep web search, and makes the following scientific contributions: 1. A more specific classification scheme for deep web search systems, to better illustrate the differences and variation between these systems. 2. Virtual surfacing, a new, and in our opinion better, deep web search paradigm which tries to combine the benefits of the two already existing paradigms, surfacing and virtual integration, and which also raises new research opportunities. 3. A stack decoding approach which combines rules and statistical usage information for interpreting the end-user’s free-text query, and to subsequently derive filled-out web forms based on that interpretation. 4. A practical comparison of the developed approach against a well-established text-processing toolkit. 5. Empirical evidence that, for a single site, end-users would rather use the proposed free-text search interface instead of a complex web form. Analysis of data obtained from user studies shows that the stack decoding approach works as well as, or better than, today’s top-performing alternatives.

AB - The World Wide Web contains billions of documents (and counting); hence, it is likely that some document will contain the answer or content you are searching for. While major search engines like Bing and Google often manage to return relevant results to your query, there are plenty of situations in which they are less capable of doing so. Specifically, there is a noticeable shortcoming in situations that involve the retrieval of data from the deep web. Deep web data is difficult to crawl and index for today’s web search engines, and this is largely due to the fact that the data must be accessed via complex web forms. However, deep web data can be highly relevant to the information-need of the end-user. This thesis overviews the problems, solutions, and paradigms for deep web search. Moreover, it proposes a new paradigm to overcome the apparent limitations in the current state of deep web search, and makes the following scientific contributions: 1. A more specific classification scheme for deep web search systems, to better illustrate the differences and variation between these systems. 2. Virtual surfacing, a new, and in our opinion better, deep web search paradigm which tries to combine the benefits of the two already existing paradigms, surfacing and virtual integration, and which also raises new research opportunities. 3. A stack decoding approach which combines rules and statistical usage information for interpreting the end-user’s free-text query, and to subsequently derive filled-out web forms based on that interpretation. 4. A practical comparison of the developed approach against a well-established text-processing toolkit. 5. Empirical evidence that, for a single site, end-users would rather use the proposed free-text search interface instead of a complex web form. Analysis of data obtained from user studies shows that the stack decoding approach works as well as, or better than, today’s top-performing alternatives.

KW - EWI-24175

KW - IR-88253

KW - METIS-299638

U2 - 10.3990/1.9789036535649

DO - 10.3990/1.9789036535649

M3 - PhD Thesis - Research UT, graduation UT

SN - 978-90-365-3564-9

PB - University of Twente

CY - Enschede

ER -

Tjin-Kam-Jet K. Distributed Deep Web Search. Enschede: University of Twente, 2013. 124 p. https://doi.org/10.3990/1.9789036535649