Challenges and Opportunities for Statistics in the Era of Data Science

  • Claudia Kirch
  • , Soumendra Lahiri
  • , Harald Binder
  • , Werner Brannath
  • , Ivor Cribben
  • , Holger Dette
  • , Philipp Doebler
  • , Oliver Feng
  • , Axel Gandy
  • , Sonja Greven
  • , Barbara Hammer
  • , Stefan Harmeling
  • , Thomas Hotz
  • , Göran Kauermann
  • , Joscha Krause
  • , Georg Krempl
  • , Alicia Nieto-Reyes
  • , Ostap Okhrin
  • , Hernando Ombao
  • , Florian Pein
  • Michal Pešta, Dimitris Politis, Li-Xuan Qin, Tom Rainforth, Holger Rauhut, Henry Reeve, David Salinas, Johannes Schmidt-Hieber, Clayton Scott, Johan Segers, Myra Spiliopoulou, Adalbert Wilhelm, Ines Wilms, Yi Yu, Johannes Lederer

Research output: Contribution to journalArticleAcademicpeer-review

98 Downloads (Pure)

Abstract

Statistics as a scientific discipline is currently facing the great challenge of finding its place in data science once more. At the beginning of the last century, the development of the discipline of statistics was initiated by data-related research questions. Nowadays, it is often viewed to have not kept up with the current developments in data science, which are largely focused on algorithmic, exploratory, and computational aspects and often driven by other disciplines, such as computer science. However, statistics can—and should—contribute to the advances of data science. Of most interest are the strengths of statistics, such as the mathematical focus that leads to theoretical guarantees. This includes methods for formal modeling, hypothesis tests, uncertainty quantification, and statistical inference. Of particular interest are also established statistical frameworks to handle causality or data deficiencies such as dependence, missingness, biases, or confounding.

This article summarizes the findings of a discussion workshop on the topic that was held in June 2023 in Hannover, Germany. The discussion centered around the following questions: How must statistics be set up so that it can contribute (more) to modern data science? In which direction should it develop further? Which strengths can already be used now? What conditions must be created so that this can succeed? What can be done to arrive at a common language? What is the added value of formal modeling, inference, and the mathematical perspective taken in statistics?
Original languageEnglish
Number of pages60
JournalHarvard Data Science Review
Volume7
Issue number2
DOIs
Publication statusPublished - 28 May 2025

Keywords

  • Generalization
  • Identifiability
  • Machine Learning (ML)
  • Reproducibility
  • Uncertainty
  • Quantification

Fingerprint

Dive into the research topics of 'Challenges and Opportunities for Statistics in the Era of Data Science'. Together they form a unique fingerprint.

Cite this