Prediction instrument development for complex domains

Sjoerd van der Spoel

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

92 Downloads (Pure)

Abstract

Developing prediction instruments requires understanding the specific domain they are developed for. The more complex the domain is, the more factors affect the prediction outcome. For a complex domain, there is some consensus that it is important that the analysis is specific to that domain. The complexity of the domain implies that there are likely to be factors at play in the domain that are unique to that domain. A literature review of similar domains is therefore unlikely to uncover all important variables. We have developed a method for prediction instrument development that captures the 'soft' aspects that are specific to the domain. This method, Prediction Instrument Development for Complex Domains (PID-CD) starts with asking those directly involved with the domain to brainstorm on what affects what is to be predicted. Combined with observations from a field study, this leads to a set of testable hypotheses. These are combined with a set of constraints, which determine the conditions under which a predictive model is actionable. The hypotheses are converted to data selection and cleaning strategies, that determine which variables to use in a predictive model, and how noise should be removed from these variables. The constraints determine which strategies are converted to predictive models, and which predictive models have sufficient predictive performance. The domain experts and decision makers finally determine which predictive model will be used as the basis for a prediction instrument. The main contribution of this thesis is a rigorous and transparent method for domain analysis as part of prediction instrument development. We have demonstrated that this method of soft-inclusive domain analysis leads to better predictive power than would be achieved with soft-exclusive domain analysis, through having a more complete view of what factors in the domain affect the prediction outcome. Furthermore, this method allows one to directly relate predictive power with the hypotheses, contributing to better domain understanding.
Original languageEnglish
Awarding Institution
  • University of Twente
Supervisors/Advisors
  • van Hillegersberg, Jos , Supervisor
  • Amrit, Chintan, Advisor
Award date14 Sep 2016
Place of PublicationEnschede
Publisher
Print ISBNs978-90-365-4174-9
DOIs
Publication statusPublished - 14 Sep 2016

Fingerprint

Cleaning

Keywords

  • METIS-317639
  • IR-101042

Cite this

van der Spoel, Sjoerd. / Prediction instrument development for complex domains. Enschede : Universiteit Twente, 2016. 224 p.
@phdthesis{76d6f0f2460745dea687662e46b86571,
title = "Prediction instrument development for complex domains",
abstract = "Developing prediction instruments requires understanding the specific domain they are developed for. The more complex the domain is, the more factors affect the prediction outcome. For a complex domain, there is some consensus that it is important that the analysis is specific to that domain. The complexity of the domain implies that there are likely to be factors at play in the domain that are unique to that domain. A literature review of similar domains is therefore unlikely to uncover all important variables. We have developed a method for prediction instrument development that captures the 'soft' aspects that are specific to the domain. This method, Prediction Instrument Development for Complex Domains (PID-CD) starts with asking those directly involved with the domain to brainstorm on what affects what is to be predicted. Combined with observations from a field study, this leads to a set of testable hypotheses. These are combined with a set of constraints, which determine the conditions under which a predictive model is actionable. The hypotheses are converted to data selection and cleaning strategies, that determine which variables to use in a predictive model, and how noise should be removed from these variables. The constraints determine which strategies are converted to predictive models, and which predictive models have sufficient predictive performance. The domain experts and decision makers finally determine which predictive model will be used as the basis for a prediction instrument. The main contribution of this thesis is a rigorous and transparent method for domain analysis as part of prediction instrument development. We have demonstrated that this method of soft-inclusive domain analysis leads to better predictive power than would be achieved with soft-exclusive domain analysis, through having a more complete view of what factors in the domain affect the prediction outcome. Furthermore, this method allows one to directly relate predictive power with the hypotheses, contributing to better domain understanding.",
keywords = "METIS-317639, IR-101042",
author = "{van der Spoel}, Sjoerd",
year = "2016",
month = "9",
day = "14",
doi = "10.3990/1.9789036541749",
language = "English",
isbn = "978-90-365-4174-9",
publisher = "Universiteit Twente",
school = "University of Twente",

}

Prediction instrument development for complex domains. / van der Spoel, Sjoerd.

Enschede : Universiteit Twente, 2016. 224 p.

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

TY - THES

T1 - Prediction instrument development for complex domains

AU - van der Spoel, Sjoerd

PY - 2016/9/14

Y1 - 2016/9/14

N2 - Developing prediction instruments requires understanding the specific domain they are developed for. The more complex the domain is, the more factors affect the prediction outcome. For a complex domain, there is some consensus that it is important that the analysis is specific to that domain. The complexity of the domain implies that there are likely to be factors at play in the domain that are unique to that domain. A literature review of similar domains is therefore unlikely to uncover all important variables. We have developed a method for prediction instrument development that captures the 'soft' aspects that are specific to the domain. This method, Prediction Instrument Development for Complex Domains (PID-CD) starts with asking those directly involved with the domain to brainstorm on what affects what is to be predicted. Combined with observations from a field study, this leads to a set of testable hypotheses. These are combined with a set of constraints, which determine the conditions under which a predictive model is actionable. The hypotheses are converted to data selection and cleaning strategies, that determine which variables to use in a predictive model, and how noise should be removed from these variables. The constraints determine which strategies are converted to predictive models, and which predictive models have sufficient predictive performance. The domain experts and decision makers finally determine which predictive model will be used as the basis for a prediction instrument. The main contribution of this thesis is a rigorous and transparent method for domain analysis as part of prediction instrument development. We have demonstrated that this method of soft-inclusive domain analysis leads to better predictive power than would be achieved with soft-exclusive domain analysis, through having a more complete view of what factors in the domain affect the prediction outcome. Furthermore, this method allows one to directly relate predictive power with the hypotheses, contributing to better domain understanding.

AB - Developing prediction instruments requires understanding the specific domain they are developed for. The more complex the domain is, the more factors affect the prediction outcome. For a complex domain, there is some consensus that it is important that the analysis is specific to that domain. The complexity of the domain implies that there are likely to be factors at play in the domain that are unique to that domain. A literature review of similar domains is therefore unlikely to uncover all important variables. We have developed a method for prediction instrument development that captures the 'soft' aspects that are specific to the domain. This method, Prediction Instrument Development for Complex Domains (PID-CD) starts with asking those directly involved with the domain to brainstorm on what affects what is to be predicted. Combined with observations from a field study, this leads to a set of testable hypotheses. These are combined with a set of constraints, which determine the conditions under which a predictive model is actionable. The hypotheses are converted to data selection and cleaning strategies, that determine which variables to use in a predictive model, and how noise should be removed from these variables. The constraints determine which strategies are converted to predictive models, and which predictive models have sufficient predictive performance. The domain experts and decision makers finally determine which predictive model will be used as the basis for a prediction instrument. The main contribution of this thesis is a rigorous and transparent method for domain analysis as part of prediction instrument development. We have demonstrated that this method of soft-inclusive domain analysis leads to better predictive power than would be achieved with soft-exclusive domain analysis, through having a more complete view of what factors in the domain affect the prediction outcome. Furthermore, this method allows one to directly relate predictive power with the hypotheses, contributing to better domain understanding.

KW - METIS-317639

KW - IR-101042

U2 - 10.3990/1.9789036541749

DO - 10.3990/1.9789036541749

M3 - PhD Thesis - Research UT, graduation UT

SN - 978-90-365-4174-9

PB - Universiteit Twente

CY - Enschede

ER -

van der Spoel S. Prediction instrument development for complex domains. Enschede: Universiteit Twente, 2016. 224 p. https://doi.org/10.3990/1.9789036541749