IMPrECISE: Good-is-good-enough Data Integration

Christoph Koch, B. König-Ries, V. Markl, Maurice van Keulen

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademic

36 Downloads (Pure)

Abstract

The IMPrECISE system is a probabilistic XML database system which supports near-automatic integration of XML documents. What is required of the user is to configure the system with a few simple knowledge rules allowing the system to sufficiently eliminate nonsense possibilities. We demonstrate the integration process under conditions with varying degrees of confusion and different sets of rules. Even when an integrated document still contains much uncertainty, it can be queried effectively. The system produces a sequence of possible result elements ranked by likelihood. User feedback on query results further reduces uncertainty which in a sense continues the semantic integration process incrementally. We demonstrate querying on integrated documents and measure answer quality with adapted precision and recall measures. The user feedback mechanism has not been implemented, hence cannot be demonstrated yet. IMPrECISE has been implemented as an XQuery module for the XML DBMS MonetDB/XQuery. Therefore, the demo also illustrates the power of this XML DBMS and of XQuery as both a query and programming language.
Original languageUndefined
Title of host publication08421 Abstracts Collection - Uncertainty Management in Information Systems
EditorsC. Koch, B. König-Ries, V. Markl, Maurice van Keulen
Place of PublicationDagstuhl, Germany
PublisherSchloss Dagstuhl - Leibniz-Zentrum fuer Informatik
Pages24-25
Number of pages1
ISBN (Print)1862-4405
Publication statusPublished - Mar 2009
EventUncertainty Management in Information Systems: Dagstuhl Seminar 08421 - Dagstuhl, Germany, Dagstuhl, Germany
Duration: 12 Oct 200817 Oct 2008

Publication series

NameDagstuhl Seminar Proceedings
PublisherSchloss Dagstuhl - Leibniz-Zentrum fuer Informatik
Number08421
ISSN (Print)1862-4405

Workshop

WorkshopUncertainty Management in Information Systems
CountryGermany
CityDagstuhl
Period12/10/0817/10/08
Other12 - 17 Oct 2008

Keywords

  • METIS-265203
  • EWI-15241
  • Data Integration
  • IR-65442
  • probabilistic databases
  • data quality
  • entity resolution
  • Uncertainty management

Cite this

Koch, C., König-Ries, B., Markl, V., & van Keulen, M. (2009). IMPrECISE: Good-is-good-enough Data Integration. In C. Koch, B. König-Ries, V. Markl, & M. van Keulen (Eds.), 08421 Abstracts Collection - Uncertainty Management in Information Systems (pp. 24-25). (Dagstuhl Seminar Proceedings; No. 08421). Dagstuhl, Germany: Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik.
Koch, Christoph ; König-Ries, B. ; Markl, V. ; van Keulen, Maurice . / IMPrECISE: Good-is-good-enough Data Integration. 08421 Abstracts Collection - Uncertainty Management in Information Systems. editor / C. Koch ; B. König-Ries ; V. Markl ; Maurice van Keulen. Dagstuhl, Germany : Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2009. pp. 24-25 (Dagstuhl Seminar Proceedings; 08421).
@inproceedings{3dd053ab3c6045bdaac8d733011e70f8,
title = "IMPrECISE: Good-is-good-enough Data Integration",
abstract = "The IMPrECISE system is a probabilistic XML database system which supports near-automatic integration of XML documents. What is required of the user is to configure the system with a few simple knowledge rules allowing the system to sufficiently eliminate nonsense possibilities. We demonstrate the integration process under conditions with varying degrees of confusion and different sets of rules. Even when an integrated document still contains much uncertainty, it can be queried effectively. The system produces a sequence of possible result elements ranked by likelihood. User feedback on query results further reduces uncertainty which in a sense continues the semantic integration process incrementally. We demonstrate querying on integrated documents and measure answer quality with adapted precision and recall measures. The user feedback mechanism has not been implemented, hence cannot be demonstrated yet. IMPrECISE has been implemented as an XQuery module for the XML DBMS MonetDB/XQuery. Therefore, the demo also illustrates the power of this XML DBMS and of XQuery as both a query and programming language.",
keywords = "METIS-265203, EWI-15241, Data Integration, IR-65442, probabilistic databases, data quality, entity resolution, Uncertainty management",
author = "Christoph Koch and B. K{\"o}nig-Ries and V. Markl and {van Keulen}, Maurice",
year = "2009",
month = "3",
language = "Undefined",
isbn = "1862-4405",
series = "Dagstuhl Seminar Proceedings",
publisher = "Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik",
number = "08421",
pages = "24--25",
editor = "C. Koch and B. K{\"o}nig-Ries and V. Markl and {van Keulen}, Maurice",
booktitle = "08421 Abstracts Collection - Uncertainty Management in Information Systems",
address = "Germany",

}

Koch, C, König-Ries, B, Markl, V & van Keulen, M 2009, IMPrECISE: Good-is-good-enough Data Integration. in C Koch, B König-Ries, V Markl & M van Keulen (eds), 08421 Abstracts Collection - Uncertainty Management in Information Systems. Dagstuhl Seminar Proceedings, no. 08421, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, pp. 24-25, Uncertainty Management in Information Systems, Dagstuhl, Germany, 12/10/08.

IMPrECISE: Good-is-good-enough Data Integration. / Koch, Christoph; König-Ries, B.; Markl, V.; van Keulen, Maurice .

08421 Abstracts Collection - Uncertainty Management in Information Systems. ed. / C. Koch; B. König-Ries; V. Markl; Maurice van Keulen. Dagstuhl, Germany : Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2009. p. 24-25 (Dagstuhl Seminar Proceedings; No. 08421).

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademic

TY - GEN

T1 - IMPrECISE: Good-is-good-enough Data Integration

AU - Koch, Christoph

AU - König-Ries, B.

AU - Markl, V.

AU - van Keulen, Maurice

PY - 2009/3

Y1 - 2009/3

N2 - The IMPrECISE system is a probabilistic XML database system which supports near-automatic integration of XML documents. What is required of the user is to configure the system with a few simple knowledge rules allowing the system to sufficiently eliminate nonsense possibilities. We demonstrate the integration process under conditions with varying degrees of confusion and different sets of rules. Even when an integrated document still contains much uncertainty, it can be queried effectively. The system produces a sequence of possible result elements ranked by likelihood. User feedback on query results further reduces uncertainty which in a sense continues the semantic integration process incrementally. We demonstrate querying on integrated documents and measure answer quality with adapted precision and recall measures. The user feedback mechanism has not been implemented, hence cannot be demonstrated yet. IMPrECISE has been implemented as an XQuery module for the XML DBMS MonetDB/XQuery. Therefore, the demo also illustrates the power of this XML DBMS and of XQuery as both a query and programming language.

AB - The IMPrECISE system is a probabilistic XML database system which supports near-automatic integration of XML documents. What is required of the user is to configure the system with a few simple knowledge rules allowing the system to sufficiently eliminate nonsense possibilities. We demonstrate the integration process under conditions with varying degrees of confusion and different sets of rules. Even when an integrated document still contains much uncertainty, it can be queried effectively. The system produces a sequence of possible result elements ranked by likelihood. User feedback on query results further reduces uncertainty which in a sense continues the semantic integration process incrementally. We demonstrate querying on integrated documents and measure answer quality with adapted precision and recall measures. The user feedback mechanism has not been implemented, hence cannot be demonstrated yet. IMPrECISE has been implemented as an XQuery module for the XML DBMS MonetDB/XQuery. Therefore, the demo also illustrates the power of this XML DBMS and of XQuery as both a query and programming language.

KW - METIS-265203

KW - EWI-15241

KW - Data Integration

KW - IR-65442

KW - probabilistic databases

KW - data quality

KW - entity resolution

KW - Uncertainty management

M3 - Conference contribution

SN - 1862-4405

T3 - Dagstuhl Seminar Proceedings

SP - 24

EP - 25

BT - 08421 Abstracts Collection - Uncertainty Management in Information Systems

A2 - Koch, C.

A2 - König-Ries, B.

A2 - Markl, V.

A2 - van Keulen, Maurice

PB - Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik

CY - Dagstuhl, Germany

ER -

Koch C, König-Ries B, Markl V, van Keulen M. IMPrECISE: Good-is-good-enough Data Integration. In Koch C, König-Ries B, Markl V, van Keulen M, editors, 08421 Abstracts Collection - Uncertainty Management in Information Systems. Dagstuhl, Germany: Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. 2009. p. 24-25. (Dagstuhl Seminar Proceedings; 08421).