Why good data analysts need to be critical synthesists: Determining the role of semantics in data analysis

Simon Scheider, Frank Ostermann, Benjamin Adams

Research output: Contribution to journalArticleAcademicpeer-review

10 Citations (Scopus)

Abstract

In this article, we critically examine the role of semantic technology in data driven analysis. We explain why learning from data is more than just analyzing data, including also a number of essential synthetic parts that suggest a revision of George Box’s model of data analysis in statistics. We review arguments from statistical learning under uncertainty, workflow reproducibility, as well as from philosophy of science, and propose an alternative, synthetic learning model that takes into account semantic conflicts, observation, biased model and data selection, as well as interpretation into background knowledge. The model highlights and clarifies the different roles that semantic technology may have in fostering reproduction and reuse of data analysis across communities of practice under the conditions of informational uncertainty. We also investigate the role of semantic technology in current analysis and workflow tools, compare it with the requirements of our model, and conclude with a roadmap of 8 challenging research problems which currently seem largely unaddressed.
Original languageEnglish
Pages (from-to)11-22
Number of pages12
JournalFuture generation computer systems
Volume72
DOIs
Publication statusPublished - 2017

Fingerprint

Semantics
Statistics
Uncertainty

Keywords

  • METIS-322130
  • ITC-ISI-JOURNAL-ARTICLE

Cite this

@article{f15671e428d0470089a47335b23a3e11,
title = "Why good data analysts need to be critical synthesists: Determining the role of semantics in data analysis",
abstract = "In this article, we critically examine the role of semantic technology in data driven analysis. We explain why learning from data is more than just analyzing data, including also a number of essential synthetic parts that suggest a revision of George Box’s model of data analysis in statistics. We review arguments from statistical learning under uncertainty, workflow reproducibility, as well as from philosophy of science, and propose an alternative, synthetic learning model that takes into account semantic conflicts, observation, biased model and data selection, as well as interpretation into background knowledge. The model highlights and clarifies the different roles that semantic technology may have in fostering reproduction and reuse of data analysis across communities of practice under the conditions of informational uncertainty. We also investigate the role of semantic technology in current analysis and workflow tools, compare it with the requirements of our model, and conclude with a roadmap of 8 challenging research problems which currently seem largely unaddressed.",
keywords = "METIS-322130, ITC-ISI-JOURNAL-ARTICLE",
author = "Simon Scheider and Frank Ostermann and Benjamin Adams",
year = "2017",
doi = "10.1016/j.future.2017.02.046",
language = "English",
volume = "72",
pages = "11--22",
journal = "Future generation computer systems",
issn = "0167-739X",
publisher = "Elsevier",

}

Why good data analysts need to be critical synthesists : Determining the role of semantics in data analysis. / Scheider, Simon; Ostermann, Frank; Adams, Benjamin.

In: Future generation computer systems, Vol. 72, 2017, p. 11-22.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Why good data analysts need to be critical synthesists

T2 - Determining the role of semantics in data analysis

AU - Scheider, Simon

AU - Ostermann, Frank

AU - Adams, Benjamin

PY - 2017

Y1 - 2017

N2 - In this article, we critically examine the role of semantic technology in data driven analysis. We explain why learning from data is more than just analyzing data, including also a number of essential synthetic parts that suggest a revision of George Box’s model of data analysis in statistics. We review arguments from statistical learning under uncertainty, workflow reproducibility, as well as from philosophy of science, and propose an alternative, synthetic learning model that takes into account semantic conflicts, observation, biased model and data selection, as well as interpretation into background knowledge. The model highlights and clarifies the different roles that semantic technology may have in fostering reproduction and reuse of data analysis across communities of practice under the conditions of informational uncertainty. We also investigate the role of semantic technology in current analysis and workflow tools, compare it with the requirements of our model, and conclude with a roadmap of 8 challenging research problems which currently seem largely unaddressed.

AB - In this article, we critically examine the role of semantic technology in data driven analysis. We explain why learning from data is more than just analyzing data, including also a number of essential synthetic parts that suggest a revision of George Box’s model of data analysis in statistics. We review arguments from statistical learning under uncertainty, workflow reproducibility, as well as from philosophy of science, and propose an alternative, synthetic learning model that takes into account semantic conflicts, observation, biased model and data selection, as well as interpretation into background knowledge. The model highlights and clarifies the different roles that semantic technology may have in fostering reproduction and reuse of data analysis across communities of practice under the conditions of informational uncertainty. We also investigate the role of semantic technology in current analysis and workflow tools, compare it with the requirements of our model, and conclude with a roadmap of 8 challenging research problems which currently seem largely unaddressed.

KW - METIS-322130

KW - ITC-ISI-JOURNAL-ARTICLE

UR - http://ezproxy2.utwente.nl/login?url=https://webapps.itc.utwente.nl/library/2017/isi/ostermann_why.pdf

U2 - 10.1016/j.future.2017.02.046

DO - 10.1016/j.future.2017.02.046

M3 - Article

VL - 72

SP - 11

EP - 22

JO - Future generation computer systems

JF - Future generation computer systems

SN - 0167-739X

ER -