Large scale mining and evidence combination to support medical diagnosis

Ghita Berrada

Research output: ThesisPhD Thesis - Research UT, graduation UT

410 Downloads (Pure)


"Errare humanum est". Clinicians, however smart, caring and meticulous they may be, are only human, all too human. So they are bound to occasionally make mistakes, diagnosis errors (i.e delayed/missed or wrong diagnosis) in particular. With a prevalence of misdiagnosis of 15% in most areas of medicine, of which about 32% are due to errors in clinician assessments, misdiagnosis is a major if overlooked problem which seems to have systemic roots rather than just being the problem a few isolated "bad apples". Assigning blame to individual health practitioners for diagnosis failures will not only fail to solve the systemic issues but will also ensure that the same avoidable mistakes are repeated. To devise solutions that minimize the occurrence of misdiagnoses and/or their impact, it is necessary to understand the root systemic causes of misdiagnosis since, as stated in the landmark 2000 Institute of Medicine report on medical errors,“Errors can be prevented by designing systems that make it hard for people to do the wrong thing and easy for people to do the right thing‿. Misdiagnoses mostly occur because diagnosis is a decision-making process made under constraints that may conflict with the accuracy requirement. Such constraints include cost constraints, clinician time/availability and energy constraints, high data uncertainty as well the involvement of many clinicians/technicians in the process. Those constraints mean that clinicians have a hard time accessing all the patient data making it difficult for them to gather all the clues and evidence needed to reach a correct diagnosis and that clinicians have to rely on sometimes faulty cognitive shortcuts. So how can we use database/data mining knowledge to support clinicians in diagnosis process and obviate the risks posed by data fragmentation and cognitive shortcuts and biases? In response to this question, we designed a medical data sharing platform, using EEG (electroencephalogram) data as a representative example of medical data, to meet the following objectives: -sharing patient data and making it easily accessible -helping researchers help clinicians by providing a standard trove of data, to ensure (semi)-automated medical data interpretation methods are more easily comparable and reproducible, and by providing a data processing platform -making it easy to browse the data with similarity search requests (useful for differential diagnosis or diagnosis by comparison) -combining evidence at hand to provide a set of diagnosis hypotheses and their attached likelihood We showed that Hadoop would be a suitable medical data sharing and processing platform. We also proposed a Dempster-Shafer based framework to combine the pieces of evidence obtained during the diagnosis process so as to generate and/or update the list of possible diagnoses and their likelihoods throughout the process. Finally, we designed a feature-based similarity metric to perform similarity search on EEG data.
Original languageEnglish
Awarding Institution
  • University of Twente
  • Apers, Peter Maria Gerardus, Supervisor
  • van Keulen, Maurice, Co-Supervisor
Thesis sponsors
Award date16 Jan 2015
Place of PublicationEnschede
Print ISBNs978-90-365-3825-1
Publication statusPublished - 16 Jan 2015


  • EWI-25666
  • EEG
  • evidence combination
  • Hadoop
  • IR-93887
  • METIS-308485
  • medical data sharing platform
  • Dempster-Shafer theory
  • similarity search
  • misdiagnosis


Dive into the research topics of 'Large scale mining and evidence combination to support medical diagnosis'. Together they form a unique fingerprint.

Cite this