Predicting the occurrence and spread of harmful microorganisms with statistical models

J.M. van Niekerk*

*Corresponding author for this work

Research output: ThesisPhD Thesis - Research UT, graduation UT

32 Downloads (Pure)


Statistical models are essential to understand the occurrence and spread of microbes to support decision-making in microbiology and epidemiology. Antimicrobial resistance (AMR) is a multifaceted global problem and a significant threat to sustainable modern healthcare. This thesis aims to identify knowledge gaps in the AMR research field and explain the added value of using statistical models and novel spatiotemporal data to predict and identify risk factors for the occurrence and spread of AMR.
Strategic action plans to tackle the increasing international threat of AMR are based upon research agendas that are informed using knowledge gaps in the AMR research field. Currently, these knowledge gaps are identified manually and are often subjective. Chapter 2 describes how bibliometric data-driven methodology can be used to identify knowledge gaps in AMR research. To this end, twenty years of AMR related articles were extracted using the PubMed search engine. With structural topic modelling I identified the topics comprising the AMR research field, while topic clusters were created using hierarchical clustering on the topic proportions. Potential AMR knowledge gaps were obtained using Spearman’s correlation between topic clusters and topics and between individual topics. A total of 88 topics and seven topic clusters were identified from 158 616 scientific AMR research articles. In total, 421 potential knowledge gaps were identified between the topic clusters and topics and 2 663 between individual topics. Key knowledge gaps between molecular and laboratory AMR research were highlighted. The knowledge gaps between AMR research regarding water and the environment and both institutional and international surveillance topics were highlighted at the topic level. These results provide an innovative, data-driven way to identify knowledge gaps in AMR research.
Surgical site infections (SSI) make up 19.6% of healthcare-associated infections (HAIs) in Europe [98]. Risk factor identification studies for the occurrence of SSI do not usually specify how continuous variables cut-offs are determined. In most cases, they use standard medical cut-offs without considering the data being studied. Chapter 3 identifies the risk factors for the occurrence of SSI for digestive, thoracic and orthopaedic system surgeries using standard medical and data-driven cut-off values. Retrospective surgical procedure data, individual electronic health records, pharmaceutical data and laboratory data were used from the Erasmus MC University Medical Centre in The Netherlands. Risk factors for the occurrence of SSI were identified using a multivariate forward-step logistic regression model. Standard medical cut-off values were compared with cut-offs determined from the data. For digestive, orthopaedic and thoracic system surgical procedures, the risk factors identified for the occurrence of SSI were preoperative temperature of 38 oC and antibiotics used at the time of surgery. C-reactive protein (CRP) and the duration of the surgery were identified as risk factors for digestive surgical procedures. Being an adult (age ≥ 18) was identified as a protective effect for thoracic surgical procedures. Data-driven cut-off values identified for temperature, age, and CRP, explained the occurrence of SSI outcome up to 19.5% better than standard medical cut-off values. Future studies should investigate if data-driven cut-offs can add value to explain the clinical outcome being modelled and not solely rely on standard medical cut-off values for continuous variables to identify risk factors.
Transmission of harmful microorganisms (HMO) poses a major threat to patients and healthcare workers in healthcare settings. The most effective countermeasure against these transmissions is the adherence to hand hygiene policies, but adherence rates are relatively low and vary over space and time. The spatiotemporal effects of varying levels hand hygiene compliance on the transmission and spread of hand-transmitted HMO in a closed healthcare setting must still be quantified. Chapter 4 describes how identifies healthcare worker occupation group of potential super-spreaders and the spatiotemporal effects on the hand transmission of HMO quantified for varying levels of hand hygiene compliance (HHC) caused by this group using their spatiotemporal movements. Spatiotemporal data were collected in the University Medical Center Groningen (UMCG) using radio frequency identification technology. The effects of five probability distributions of HHC and three harmful microorganism transmission rates were simulated using a dynamic agent-based simulation model. The effects of initial simulation assumptions on the simulation results were quantified using five risk outcomes. Nurses were identified as the potential super-spreader healthcare worker occupation group. During lack of HHC (5%) and high transmission rates (5% per contact moment), a colonised nurse can transfer microbes to three of the 17 healthcare worker or patients encountered during the 98.4 minutes of visiting 23 rooms while colonised. The HMO transmission potential for nurses is higher during weeknights (5 pm – 7 am) and weekends as compared to weekdays (7 am – 5 pm). Spatiotemporal behaviour and social mixing patterns of healthcare can change the expected number of hand transmissions and spread HMO by super-spreaders in a closed healthcare setting. These insights can be used to evaluate spatiotemporal safety behaviours and develop infection prevention and control strategies.
Vancomycin-resistant enterococci (VRE) is can cause severe patient health and monetary burdens. The odds of a hospital patient acquiring VRE increases when using antibiotics and when prior room occupants had VRE, but the antibiotic use of prior room occupants are often neglected. Chapter 5 describes how the occurrence and spread of VRE can be explained using intrahospital patient movements (IPM) and their antibiotic use between hospital wards. Retrospective IPM, antibiotic use and PCR screening data were used from a hospital in the Netherlands. A dynamic directed spatiotemporal graph was developed, and together with the PageRank algorithm used to calculate two daily centrality measures to summarise the flow of patients and antibiotics at the ward level. With a decision tree and random forest model I predicted the daily occurrence of VRE for every ward and compared the models’ performance using a 30% test sample. The decision tree model produced a simple set of rules that can be used to determine the daily probability of VRE occurrence for each hospital ward. The decision tree model achieved an area under the curve of 0.685 and the random forest model 0.886 on the test set. These results confirm that the random forest model performs better than a single decision tree for all levels of model sensitivity and specificity at the cost of model simplicity. An early warning system for VRE can be developed and inform infection prevention plans and outbreak strategies further using these results.
In summary, this thesis showed that data-driven statistical models can improve our understanding of antimicrobial resistance. It considers how different sources of spatiotemporal data may be used to predict its occurrence and spread of AMR in hospitals.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Faculty of Geo-Information Science and Earth Observation
  • University of Twente
  • Stein, Alfred, Supervisor
  • van Gemert-Pijnen, Lisette J.E.W.C., Supervisor
  • Braakman-Jansen, Annemarie L.M.A., Co-Supervisor
Award date1 Dec 2021
Place of PublicationEnschede
Print ISBNs978-90-365-5294-3
Publication statusPublished - 1 Dec 2021


Dive into the research topics of 'Predicting the occurrence and spread of harmful microorganisms with statistical models'. Together they form a unique fingerprint.

Cite this