Rule-based conditioning of probabilistic data

Maurice van Keulen*, Benjamin Kaminski, Christoph Matheja, Joost P. Katoen

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    69 Downloads (Pure)

    Abstract

    Data interoperability is a major issue in data management for data science and big data analytics. Probabilistic data integration (PDI) is a specific kind of data integration where extraction and integration problems such as inconsistency and uncertainty are handled by means of a probabilistic data representation. This allows a data integration process with two phases: (1) a quick partial integration where data quality problems are represented as uncertainty in the resulting integrated data, and (2) using the uncertain data and continuously improving its quality as more evidence is gathered. The main contribution of this paper is an iterative approach for incorporating evidence of users in the probabilistically integrated data. Evidence can be specified as hard or soft rules (i.e., rules that are uncertain themselves).

    Original languageEnglish
    Title of host publicationScalable Uncertainty Management
    Subtitle of host publication12th International Conference, SUM 2018, Milan, Italy, October 3-5, 2018, Proceedings
    EditorsDavide Ciucci, Gabriella Pasi, Barbara Vantaggi
    PublisherSpringer
    Pages290-305
    Number of pages16
    ISBN (Electronic)978-3-030-00461-3
    ISBN (Print)9783030004606
    DOIs
    Publication statusPublished - 1 Jan 2018
    Event12th International Conference on Scalable Uncertainty Management 2018 - Milan, Italy
    Duration: 3 Oct 20185 Oct 2018
    Conference number: 12
    http://www.ir.disco.unimib.it/sum2018/

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume11142 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference12th International Conference on Scalable Uncertainty Management 2018
    Abbreviated titleSUM 2018
    CountryItaly
    CityMilan
    Period3/10/185/10/18
    Internet address

      Fingerprint

    Keywords

    • Data cleaning
    • Data integration
    • Information extraction
    • Probabilistic databases
    • Probabilistic programming

    Cite this

    van Keulen, M., Kaminski, B., Matheja, C., & Katoen, J. P. (2018). Rule-based conditioning of probabilistic data. In D. Ciucci, G. Pasi, & B. Vantaggi (Eds.), Scalable Uncertainty Management: 12th International Conference, SUM 2018, Milan, Italy, October 3-5, 2018, Proceedings (pp. 290-305). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11142 LNAI). Springer. https://doi.org/10.1007/978-3-030-00461-3_20, https://doi.org/10.1007/978-3-030-00461-3_20