TY - JOUR
T1 - Using a stepwise approach to simultaneously develop and validate machine learning based prediction models
AU - Haalboom, M.
AU - Kort, S.
AU - van der Palen, J.
N1 - Funding Information:
During the proof stage of this article, we received a Letter to the Editor by Heinze et al. We have responded to their Letter and made minor changes where necessary.
Publisher Copyright:
© 2021
PY - 2022/2/1
Y1 - 2022/2/1
N2 - Accurate diagnosis of a disease is essential in healthcare. Prediction models, based on classical regression techniques, are widely used in clinical practice. Machine Learning (ML) techniques might be preferred in case of a large amount of data per patient and relatively limited numbers of subjects. However, this increases the risk of overfitting, and external validation is imperative. However, in the field of ML, new and more efficient techniques are developed rapidly, and if recruiting patients for a validation study is time consuming, the ML technique used to develop the first model might have been surpassed by more efficient ML techniques, rendering this original model no longer relevant. We demonstrate a stepwise design for simultaneous development and validation of prediction models based on ML techniques. The design enables – in one study - evaluation of the stability and robustness of a prediction model over increasing sample size as well as assessment of the stability of sensitivity/specificity at a chosen cut-off. This will shorten the time to introduction of a new test in health care. We finally describe how to use regular clinical parameters in conjunction with ML based predictions, to further enhance differentiation between subjects with and without a disease.
AB - Accurate diagnosis of a disease is essential in healthcare. Prediction models, based on classical regression techniques, are widely used in clinical practice. Machine Learning (ML) techniques might be preferred in case of a large amount of data per patient and relatively limited numbers of subjects. However, this increases the risk of overfitting, and external validation is imperative. However, in the field of ML, new and more efficient techniques are developed rapidly, and if recruiting patients for a validation study is time consuming, the ML technique used to develop the first model might have been surpassed by more efficient ML techniques, rendering this original model no longer relevant. We demonstrate a stepwise design for simultaneous development and validation of prediction models based on ML techniques. The design enables – in one study - evaluation of the stability and robustness of a prediction model over increasing sample size as well as assessment of the stability of sensitivity/specificity at a chosen cut-off. This will shorten the time to introduction of a new test in health care. We finally describe how to use regular clinical parameters in conjunction with ML based predictions, to further enhance differentiation between subjects with and without a disease.
KW - Diagnostic accuracy
KW - Machine learning
KW - Model stability
KW - Prediction model
KW - Validation
UR - http://www.scopus.com/inward/record.url?scp=85112576861&partnerID=8YFLogxK
U2 - 10.1016/j.jclinepi.2021.06.008
DO - 10.1016/j.jclinepi.2021.06.008
M3 - Article
C2 - 34157373
AN - SCOPUS:85112576861
SN - 0895-4356
VL - 142
SP - 305
EP - 310
JO - Journal of clinical epidemiology
JF - Journal of clinical epidemiology
ER -