Duplicate Detection in Probabilistic Data

Fabian Panse, Maurice van Keulen, Ander de Keijzer, Norbert Ritter

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

14 Citations (Scopus)
139 Downloads (Pure)

Abstract

Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities.
Original languageUndefined
Title of host publicationProceedings of the 2nd International Workshop on New Trends in Information Integration (NTII 2010)
Place of PublicationLos Alamitos
PublisherIEEE Computer Society Press
Pages179-182
Number of pages6
ISBN (Print)978-1-4244-6522-4
DOIs
Publication statusPublished - Mar 2010
Event2nd International Workshop on New Trends in Information Integration (NTII 2010), Long Beach, California, USA: Proceedings of the 2nd International Workshop on New Trends in Information Integration (NTII 2010) - Los Alamitos
Duration: 1 Mar 2010 → …

Publication series

Name
PublisherIEEE Computer Society Press

Conference

Conference2nd International Workshop on New Trends in Information Integration (NTII 2010), Long Beach, California, USA
CityLos Alamitos
Period1/03/10 → …

Keywords

  • IR-68597
  • EWI-16565
  • DB-SDI: SCHEMA AND DATA INTEGRATION
  • METIS-270698

Cite this