Rule-based conditioning of probabilistic data

Maurice van Keulen, Benjamin Kaminski, Christoph Matheja, Joost P. Katoen

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

30 Downloads (Pure)

Abstract

Data interoperability is a major issue in data management for data science and big data analytics. Probabilistic data integration (PDI) is a specific kind of data integration where extraction and integration problems such as inconsistency and uncertainty are handled by means of a probabilistic data representation. This allows a data integration process with two phases: (1) a quick partial integration where data quality problems are represented as uncertainty in the resulting integrated data, and (2) using the uncertain data and continuously improving its quality as more evidence is gathered. The main contribution of this paper is an iterative approach for incorporating evidence of users in the probabilistically integrated data. Evidence can be specified as hard or soft rules (i.e., rules that are uncertain themselves).

Original languageEnglish
Title of host publicationScalable Uncertainty Management
Subtitle of host publication12th International Conference, SUM 2018, Milan, Italy, October 3-5, 2018, Proceedings
EditorsDavide Ciucci, Gabriella Pasi, Barbara Vantaggi
PublisherSpringer
Pages290-305
Number of pages16
ISBN (Electronic)978-3-030-00461-3
ISBN (Print)9783030004606
DOIs
Publication statusPublished - 1 Jan 2018
Event12th International Conference on Scalable Uncertainty Management 2018 - Milan, Italy
Duration: 3 Oct 20185 Oct 2018
Conference number: 12
http://www.ir.disco.unimib.it/sum2018/

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11142 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference12th International Conference on Scalable Uncertainty Management 2018
Abbreviated titleSUM 2018
CountryItaly
CityMilan
Period3/10/185/10/18
Internet address

Fingerprint

Data integration
Conditioning
Data Integration
Interoperability
Information management
Uncertainty
Uncertain Data
Data Quality
Data Management
Inconsistency
Partial
Evidence

Keywords

  • Data cleaning
  • Data integration
  • Information extraction
  • Probabilistic databases
  • Probabilistic programming

Cite this

van Keulen, M., Kaminski, B., Matheja, C., & Katoen, J. P. (2018). Rule-based conditioning of probabilistic data. In D. Ciucci, G. Pasi, & B. Vantaggi (Eds.), Scalable Uncertainty Management: 12th International Conference, SUM 2018, Milan, Italy, October 3-5, 2018, Proceedings (pp. 290-305). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11142 LNAI). Springer. https://doi.org/10.1007/978-3-030-00461-3_20, https://doi.org/10.1007/978-3-030-00461-3_20
van Keulen, Maurice ; Kaminski, Benjamin ; Matheja, Christoph ; Katoen, Joost P. / Rule-based conditioning of probabilistic data. Scalable Uncertainty Management: 12th International Conference, SUM 2018, Milan, Italy, October 3-5, 2018, Proceedings. editor / Davide Ciucci ; Gabriella Pasi ; Barbara Vantaggi. Springer, 2018. pp. 290-305 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{32df526e19014b2e943018b11d9145d6,
title = "Rule-based conditioning of probabilistic data",
abstract = "Data interoperability is a major issue in data management for data science and big data analytics. Probabilistic data integration (PDI) is a specific kind of data integration where extraction and integration problems such as inconsistency and uncertainty are handled by means of a probabilistic data representation. This allows a data integration process with two phases: (1) a quick partial integration where data quality problems are represented as uncertainty in the resulting integrated data, and (2) using the uncertain data and continuously improving its quality as more evidence is gathered. The main contribution of this paper is an iterative approach for incorporating evidence of users in the probabilistically integrated data. Evidence can be specified as hard or soft rules (i.e., rules that are uncertain themselves).",
keywords = "Data cleaning, Data integration, Information extraction, Probabilistic databases, Probabilistic programming",
author = "{van Keulen}, Maurice and Benjamin Kaminski and Christoph Matheja and Katoen, {Joost P.}",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-030-00461-3_20",
language = "English",
isbn = "9783030004606",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
pages = "290--305",
editor = "Davide Ciucci and Gabriella Pasi and Barbara Vantaggi",
booktitle = "Scalable Uncertainty Management",

}

van Keulen, M, Kaminski, B, Matheja, C & Katoen, JP 2018, Rule-based conditioning of probabilistic data. in D Ciucci, G Pasi & B Vantaggi (eds), Scalable Uncertainty Management: 12th International Conference, SUM 2018, Milan, Italy, October 3-5, 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11142 LNAI, Springer, pp. 290-305, 12th International Conference on Scalable Uncertainty Management 2018, Milan, Italy, 3/10/18. https://doi.org/10.1007/978-3-030-00461-3_20, https://doi.org/10.1007/978-3-030-00461-3_20

Rule-based conditioning of probabilistic data. / van Keulen, Maurice ; Kaminski, Benjamin; Matheja, Christoph; Katoen, Joost P.

Scalable Uncertainty Management: 12th International Conference, SUM 2018, Milan, Italy, October 3-5, 2018, Proceedings. ed. / Davide Ciucci; Gabriella Pasi; Barbara Vantaggi. Springer, 2018. p. 290-305 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11142 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Rule-based conditioning of probabilistic data

AU - van Keulen, Maurice

AU - Kaminski, Benjamin

AU - Matheja, Christoph

AU - Katoen, Joost P.

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Data interoperability is a major issue in data management for data science and big data analytics. Probabilistic data integration (PDI) is a specific kind of data integration where extraction and integration problems such as inconsistency and uncertainty are handled by means of a probabilistic data representation. This allows a data integration process with two phases: (1) a quick partial integration where data quality problems are represented as uncertainty in the resulting integrated data, and (2) using the uncertain data and continuously improving its quality as more evidence is gathered. The main contribution of this paper is an iterative approach for incorporating evidence of users in the probabilistically integrated data. Evidence can be specified as hard or soft rules (i.e., rules that are uncertain themselves).

AB - Data interoperability is a major issue in data management for data science and big data analytics. Probabilistic data integration (PDI) is a specific kind of data integration where extraction and integration problems such as inconsistency and uncertainty are handled by means of a probabilistic data representation. This allows a data integration process with two phases: (1) a quick partial integration where data quality problems are represented as uncertainty in the resulting integrated data, and (2) using the uncertain data and continuously improving its quality as more evidence is gathered. The main contribution of this paper is an iterative approach for incorporating evidence of users in the probabilistically integrated data. Evidence can be specified as hard or soft rules (i.e., rules that are uncertain themselves).

KW - Data cleaning

KW - Data integration

KW - Information extraction

KW - Probabilistic databases

KW - Probabilistic programming

UR - http://www.scopus.com/inward/record.url?scp=85054884019&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-00461-3_20

DO - 10.1007/978-3-030-00461-3_20

M3 - Conference contribution

SN - 9783030004606

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 290

EP - 305

BT - Scalable Uncertainty Management

A2 - Ciucci, Davide

A2 - Pasi, Gabriella

A2 - Vantaggi, Barbara

PB - Springer

ER -

van Keulen M, Kaminski B, Matheja C, Katoen JP. Rule-based conditioning of probabilistic data. In Ciucci D, Pasi G, Vantaggi B, editors, Scalable Uncertainty Management: 12th International Conference, SUM 2018, Milan, Italy, October 3-5, 2018, Proceedings. Springer. 2018. p. 290-305. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-00461-3_20, https://doi.org/10.1007/978-3-030-00461-3_20