Repurposing and probabilistic integration of data

B. Wanders

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

218 Downloads (Pure)

Abstract

Besides the scientific paradigms of empiricism, mathematical modelling, and simulation, the method of combining and analysing data in novel ways has become a main research paradigm capable of tackling research questions that could not be answered before. To speed up research in this new paradigm, scientists are reusing and integrating originally gathered for different purposes. This repurposing of data requires a thorough understanding of the used data sources. Data understanding is an ongoing process in which the scientists gains insight into the semantics and quality of the data through exploration and use. In this book we propose a flexible method to guide this exploration and to highlight the places where automated assistance can be used to the greatest effect. The method is based on the principles of `good is good enough' and `pay as you go', meaning that the scientist puts in only as much effort as is necessary to get the integrated data to the level of quality that he needs to continue his research. This book pursues two directions of research. The first is an investigation of note taking. By documenting his exploration efforts the scientist can share his understanding of the data sources with others. To support the scientist in this a prototype note taking system is created. This system offers a compromise between the exploratory workflow of the scientist and the rigid procedures of the research institute. The second direction is the use of probabilistic data to support the `pay as you go'principle. A formal framework for the creation of probabilistic data models is introduced. By keeping data accessible even if there are contradictions or multiple alternatives, the scientists can postpone data integration choices that would have otherwise prevented him from continuing with his work.
Original languageEnglish
Awarding Institution
  • University of Twente
Supervisors/Advisors
  • Apers, Peter Maria Gerardus, Supervisor
  • van Keulen, Maurice , Advisor
Award date16 Jun 2016
Place of PublicationEnschede
Publisher
Print ISBNs978-90-365-4110-7
DOIs
Publication statusPublished - 16 Jun 2016

Fingerprint

modeling
simulation
method
book
effect
speed

Cite this

Wanders, B.. / Repurposing and probabilistic integration of data. Enschede : Universiteit Twente, 2016. 212 p.
@phdthesis{dede31a0c31d4ff8a253ea6002d7e11f,
title = "Repurposing and probabilistic integration of data",
abstract = "Besides the scientific paradigms of empiricism, mathematical modelling, and simulation, the method of combining and analysing data in novel ways has become a main research paradigm capable of tackling research questions that could not be answered before. To speed up research in this new paradigm, scientists are reusing and integrating originally gathered for different purposes. This repurposing of data requires a thorough understanding of the used data sources. Data understanding is an ongoing process in which the scientists gains insight into the semantics and quality of the data through exploration and use. In this book we propose a flexible method to guide this exploration and to highlight the places where automated assistance can be used to the greatest effect. The method is based on the principles of `good is good enough' and `pay as you go', meaning that the scientist puts in only as much effort as is necessary to get the integrated data to the level of quality that he needs to continue his research. This book pursues two directions of research. The first is an investigation of note taking. By documenting his exploration efforts the scientist can share his understanding of the data sources with others. To support the scientist in this a prototype note taking system is created. This system offers a compromise between the exploratory workflow of the scientist and the rigid procedures of the research institute. The second direction is the use of probabilistic data to support the `pay as you go'principle. A formal framework for the creation of probabilistic data models is introduced. By keeping data accessible even if there are contradictions or multiple alternatives, the scientists can postpone data integration choices that would have otherwise prevented him from continuing with his work.",
author = "B. Wanders",
year = "2016",
month = "6",
day = "16",
doi = "10.3990/1.9789036541107",
language = "English",
isbn = "978-90-365-4110-7",
series = "SIKS dissertation series",
publisher = "Universiteit Twente",
number = "2016-24",
school = "University of Twente",

}

Repurposing and probabilistic integration of data. / Wanders, B.

Enschede : Universiteit Twente, 2016. 212 p.

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

TY - THES

T1 - Repurposing and probabilistic integration of data

AU - Wanders, B.

PY - 2016/6/16

Y1 - 2016/6/16

N2 - Besides the scientific paradigms of empiricism, mathematical modelling, and simulation, the method of combining and analysing data in novel ways has become a main research paradigm capable of tackling research questions that could not be answered before. To speed up research in this new paradigm, scientists are reusing and integrating originally gathered for different purposes. This repurposing of data requires a thorough understanding of the used data sources. Data understanding is an ongoing process in which the scientists gains insight into the semantics and quality of the data through exploration and use. In this book we propose a flexible method to guide this exploration and to highlight the places where automated assistance can be used to the greatest effect. The method is based on the principles of `good is good enough' and `pay as you go', meaning that the scientist puts in only as much effort as is necessary to get the integrated data to the level of quality that he needs to continue his research. This book pursues two directions of research. The first is an investigation of note taking. By documenting his exploration efforts the scientist can share his understanding of the data sources with others. To support the scientist in this a prototype note taking system is created. This system offers a compromise between the exploratory workflow of the scientist and the rigid procedures of the research institute. The second direction is the use of probabilistic data to support the `pay as you go'principle. A formal framework for the creation of probabilistic data models is introduced. By keeping data accessible even if there are contradictions or multiple alternatives, the scientists can postpone data integration choices that would have otherwise prevented him from continuing with his work.

AB - Besides the scientific paradigms of empiricism, mathematical modelling, and simulation, the method of combining and analysing data in novel ways has become a main research paradigm capable of tackling research questions that could not be answered before. To speed up research in this new paradigm, scientists are reusing and integrating originally gathered for different purposes. This repurposing of data requires a thorough understanding of the used data sources. Data understanding is an ongoing process in which the scientists gains insight into the semantics and quality of the data through exploration and use. In this book we propose a flexible method to guide this exploration and to highlight the places where automated assistance can be used to the greatest effect. The method is based on the principles of `good is good enough' and `pay as you go', meaning that the scientist puts in only as much effort as is necessary to get the integrated data to the level of quality that he needs to continue his research. This book pursues two directions of research. The first is an investigation of note taking. By documenting his exploration efforts the scientist can share his understanding of the data sources with others. To support the scientist in this a prototype note taking system is created. This system offers a compromise between the exploratory workflow of the scientist and the rigid procedures of the research institute. The second direction is the use of probabilistic data to support the `pay as you go'principle. A formal framework for the creation of probabilistic data models is introduced. By keeping data accessible even if there are contradictions or multiple alternatives, the scientists can postpone data integration choices that would have otherwise prevented him from continuing with his work.

U2 - 10.3990/1.9789036541107

DO - 10.3990/1.9789036541107

M3 - PhD Thesis - Research UT, graduation UT

SN - 978-90-365-4110-7

T3 - SIKS dissertation series

PB - Universiteit Twente

CY - Enschede

ER -

Wanders B. Repurposing and probabilistic integration of data. Enschede: Universiteit Twente, 2016. 212 p. (SIKS dissertation series; 2016-24). https://doi.org/10.3990/1.9789036541107