Probabilistic Data Integration

    Research output: Chapter in Book/Report/Conference proceedingEntry for encyclopedia/dictionaryAcademic

    19 Citations (Scopus)
    197 Downloads (Pure)


    Probabilistic data integration (PDI) is a specific kind of data integration where integration problems such as inconsistency and uncertainty are handled by means of a probabilistic data representation. The approach is based on the view that data quality problems (as they occur in an integration process) can be modeled as uncertainty and this uncertainty is considered an important result of the integration process.
    The PDI process contains two phases: (i) a quick partial integration where certain data quality problems are not solved immediately, but explicitly represented as uncertainty in the resulting integrated data stored in a probabilistic database; (ii) continuous improvement by using the data - a probabilistic database can be queried directly resulting in possible or approximate answers - and gathering evidence (e.g., user feedback) for improving the data quality. A probabilistic database is a specific kind of DBMS that allows storage, querying and manipulation of uncertain data. It keeps track of alternatives and the dependencies among them.
    Original languageEnglish
    Title of host publicationEncyclopedia of Big Data Technologies
    EditorsSherif Sakr, Albert Zomaya
    Place of PublicationCham
    Number of pages9
    ISBN (Electronic)978-3-319-63962-8
    Publication statusPublished - 12 Feb 2018

    Fingerprint Dive into the research topics of 'Probabilistic Data Integration'. Together they form a unique fingerprint.

    Cite this