Deep Web Entity Monitoring

Mohammadreza Khelghati, Djoerd Hiemstra, Maurice van Keulen

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

11 Citations (Scopus)
75 Downloads (Pure)

Abstract

Accessing information is an essential factor in decision making processes occurring in different domains. Therefore, broadening the coverage of available information for the decision makers is of a vital importance. In such a information-thirsty environment, accessing every source of information is considered highly valuable. Nowadays, the main or the most general approach for finding and accessing information sources is searching queries over general search engines such as Google, Yahoo, or Bing. However, these search engines do not cover all the data available on the Web. In addition to the fact that none of these search engines cover all the webpages existing on the Web, they miss the data behind web search forms. This data is defined as hidden web or deep web which is not accessible through search engines. It is estimated that deep web contains data in a scale several times bigger than the data accessible through search engines which is referred to as surface web [9, 6]. Although this information on deep web could be accessed through their own interfaces, finding and querying all the interesting sources of information that might be useful could be a difficult, time-consuming and tiring task. Considering the huge amount of information that might be related to one's information needs, it might be even impossible for a person to cover all the deep web sources of his interest. Therefore, there is a great demand for applications which can facilitate accessing this big amount of data being locked behind web search forms. Realizing approaches to meet this demand is one of the main issues targeted in this PhD project. Having provided the access to deep web data, different technique can be applied to provide users with additional values out of this data. Analyzing data, finding patterns and relationships among different data items and also data sources are considered as some of these techniques. However, in this research, monitoring entities existing in deep web sources is targeted.
Original languageUndefined
Title of host publicationProceedings of the 22nd international conference on World Wide Web companion, WWW 2013
Place of PublicationRepublic and Canton of Geneva, Switzerland
PublisherInternational World Wide Web Conferences Steering Committee
Pages377-382
Number of pages5
ISBN (Print)978-1-4503-2038-2
Publication statusPublished - May 2013
Event22nd International World Wide Web Conference, WWW 2013 - Rio de Janeiro, Brazil
Duration: 13 May 201317 May 2013
Conference number: 22
http://www2013.wwwconference.org/

Publication series

Name
PublisherInternational World Wide Web Conferences Steering Committee

Conference

Conference22nd International World Wide Web Conference, WWW 2013
Abbreviated titleWWW
CountryBrazil
CityRio de Janeiro
Period13/05/1317/05/13
Internet address

Keywords

  • crawling
  • web harvesting
  • EWI-23496
  • METIS-297724
  • IR-86796
  • Deep Web
  • entity monitoring

Cite this