Probabilistic Data Integration

Research output: Contribution to conferencePosterOther research output

78 Downloads (Pure)

Abstract

Probabilistic data integration is a specific kind of data integration where integration problems such as inconsistency and uncertainty are handled by means of a probabilistic data representation.
The approach is based on the view that data quality problems (as they occur in an integration process) can be modeled as uncertainty and this uncertainty is considered an important result of the integration process. In a sense, data quality problems arising during the data integration process are not solved immediately, but explicitly represented in the resulting integrated data. This data can be stored in a probabilistic database to be queried directly resulting in possible or approximate answers. A probabilistic database is a specific kind of DBMS that allows storage, querying and manipulation of uncertain data. It keeps track of alternatives and dependencies among them.
While traditional data integration methods more or less explicitly consider uncertainty as a problem, as something to be avoided, probabilistic data integration treats uncertainty as an additional source of information, which is precious and should be preserved. It effectively allows for postponement of solving data integration problems. When combined with an effective method for data quality measurement, it also has the potential to allow for a pay- as-you-go and good-is-good-enough approach where small iterations reduce overall effort in improving the data quality of the integrated result.
In this presentation, we give an overview of various data integration problems and how a probabilistic approach can improve them, for example, entity resolution and merging of grouping data. We furthermore illustrate how probabilistic data integration as an application asks for more theoretical research on probabilistic database technology, such as more expressive data models and (ap- proximate) querying formalisms. In particular, we present the problem of incorporation of a restricted notion of higher orderedness in datalog without loosing its important properties
Original languageEnglish
Number of pages1
Publication statusPublished - 3 Nov 2017
EventDutch-Belgian Database Day, DBDBD 2017 - Eindhoven, Netherlands
Duration: 1 Dec 20171 Dec 2017
http://wwwis.win.tue.nl/dbdbd2017

Workshop

WorkshopDutch-Belgian Database Day, DBDBD 2017
Abbreviated titleDBDBD 2017
CountryNetherlands
CityEindhoven
Period1/12/171/12/17
Internet address

Fingerprint Dive into the research topics of 'Probabilistic Data Integration'. Together they form a unique fingerprint.

Cite this