Data integration has been a challenging problem for decades. In an ambient environment, where many autonomous devices have their own information sources and network connectivity is ad hoc and peer-to-peer, it even becomes a serious bottleneck. In addition, the number of information sources per device, as well as in total, increases as well. To enable devices to exchange information without the need for interaction with a user at data integration time and without the need for extensive semantic annotations, a probabilistic approach seems rather promising. It simply teaches the device how to cope with the uncertainty occurring during data integration. Unfortunately,without any kind of world knowledge, almost everything becomes uncertain, hence maintaining all possibilities produces huge integrated information sources. Automatically integrating data sources, using very simple knowledge rules to rule out most of the nonsense possibilities, combined with storing the remaining possibilities as uncertainty in the database and resolving these during querying by means of user feedback, seems the promising solution. In this chapter we introduce this “good is good-enough” integration approach and explain the uncertainty model that is used to capture the remaining integration possibilities. We show that using this strategy, the time necessary to integrate documents drastically decreases, while the accuracy of the integrated document increases over time.
|Name||Studies in Fuzziness and Soft Computing|