TY - GEN
T1 - Designing A General Deep Web Harvester by Harvestability Factor
AU - Khelghati, Mohammadreza
AU - van Keulen, Maurice
AU - Hiemstra, Djoerd
PY - 2014/10/19
Y1 - 2014/10/19
N2 - To make deep web data accessible, harvesters have a crucial role. Targeting different domains and websites enhances the need of a general-purpose harvester which can be applied to different settings and situations. To develop such a harvester, a large number of issues should be addressed. To have all influential elements in one big picture, a new concept, called harvestability factor (HF), is introduced in this paper. The HF is defined as an attribute of a website (HFW) or a harvester (HF_H) representing the extent to which the website can be harvested or the harvester can harvest. The comprising elements of these factors are different websites’ or harvesters’ features. These elements are gathered from literature or introduced through the authors’ experiments. In addition to enabling designers of evaluating where they products stand from the harvesting perspective, the HF can act as a framework for designing harvesters. Designers can define the list of features and prioritize their implementations. To validate the effectiveness of HF in practice, it is shown how the HFs0
websites and how this is useful in designing a harvester. To validate the HF H as an evaluation metric, it is shown how it can be calculated for the harvester implemented by the authors. The results show that the developed harvester works pretty well for the targeted test set by a score of 14.783 of 15.
AB - To make deep web data accessible, harvesters have a crucial role. Targeting different domains and websites enhances the need of a general-purpose harvester which can be applied to different settings and situations. To develop such a harvester, a large number of issues should be addressed. To have all influential elements in one big picture, a new concept, called harvestability factor (HF), is introduced in this paper. The HF is defined as an attribute of a website (HFW) or a harvester (HF_H) representing the extent to which the website can be harvested or the harvester can harvest. The comprising elements of these factors are different websites’ or harvesters’ features. These elements are gathered from literature or introduced through the authors’ experiments. In addition to enabling designers of evaluating where they products stand from the harvesting perspective, the HF can act as a framework for designing harvesters. Designers can define the list of features and prioritize their implementations. To validate the effectiveness of HF in practice, it is shown how the HFs0
websites and how this is useful in designing a harvester. To validate the HF H as an evaluation metric, it is shown how it can be calculated for the harvester implemented by the authors. The results show that the developed harvester works pretty well for the targeted test set by a score of 14.783 of 15.
KW - EWI-25355
KW - METIS-309685
KW - IR-93326
KW - CR-H.3.3
M3 - Conference contribution
T3 - CEUR Workshop Proceedings
SP - 4
BT - Proceedings of the international ISWC workshop on Surfacing the Deep and the Social Web, SDSW 2014
PB - CEUR
CY - Aachen, Germany
T2 - International ISWC Workshop on Surfacing the Deep and the Social Web, SDSW 2014
Y2 - 19 October 2014 through 23 October 2014
ER -