Large scale mining and evidence combination to support medical diagnosis

Ghita Berrada

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

60 Downloads (Pure)

Abstract

"Errare humanum est". Clinicians, however smart, caring and meticulous they may be, are only human, all too human. So they are bound to occasionally make mistakes, diagnosis errors (i.e delayed/missed or wrong diagnosis) in particular. With a prevalence of misdiagnosis of 15% in most areas of medicine, of which about 32% are due to errors in clinician assessments, misdiagnosis is a major if overlooked problem which seems to have systemic roots rather than just being the problem a few isolated "bad apples". Assigning blame to individual health practitioners for diagnosis failures will not only fail to solve the systemic issues but will also ensure that the same avoidable mistakes are repeated. To devise solutions that minimize the occurrence of misdiagnoses and/or their impact, it is necessary to understand the root systemic causes of misdiagnosis since, as stated in the landmark 2000 Institute of Medicine report on medical errors,“Errors can be prevented by designing systems that make it hard for people to do the wrong thing and easy for people to do the right thing‿. Misdiagnoses mostly occur because diagnosis is a decision-making process made under constraints that may conflict with the accuracy requirement. Such constraints include cost constraints, clinician time/availability and energy constraints, high data uncertainty as well the involvement of many clinicians/technicians in the process. Those constraints mean that clinicians have a hard time accessing all the patient data making it difficult for them to gather all the clues and evidence needed to reach a correct diagnosis and that clinicians have to rely on sometimes faulty cognitive shortcuts. So how can we use database/data mining knowledge to support clinicians in diagnosis process and obviate the risks posed by data fragmentation and cognitive shortcuts and biases? In response to this question, we designed a medical data sharing platform, using EEG (electroencephalogram) data as a representative example of medical data, to meet the following objectives: -sharing patient data and making it easily accessible -helping researchers help clinicians by providing a standard trove of data, to ensure (semi)-automated medical data interpretation methods are more easily comparable and reproducible, and by providing a data processing platform -making it easy to browse the data with similarity search requests (useful for differential diagnosis or diagnosis by comparison) -combining evidence at hand to provide a set of diagnosis hypotheses and their attached likelihood We showed that Hadoop would be a suitable medical data sharing and processing platform. We also proposed a Dempster-Shafer based framework to combine the pieces of evidence obtained during the diagnosis process so as to generate and/or update the list of possible diagnoses and their likelihoods throughout the process. Finally, we designed a feature-based similarity metric to perform similarity search on EEG data.
Original languageEnglish
Awarding Institution
  • University of Twente
Supervisors/Advisors
  • Apers, Peter Maria Gerardus, Supervisor
  • van Keulen, Maurice , Co-Supervisor
Thesis sponsors
Award date16 Jan 2015
Place of PublicationEnschede
Publisher
Print ISBNs978-90-365-3825-1
DOIs
Publication statusPublished - 16 Jan 2015

Fingerprint

Electroencephalography
Medicine
Data mining
Decision making
Health
Availability
Processing
Costs
Uncertainty

Keywords

  • EWI-25666
  • EEG
  • evidence combination
  • Hadoop
  • IR-93887
  • METIS-308485
  • medical data sharing platform
  • Dempster-Shafer theory
  • similarity search
  • misdiagnosis

Cite this

Berrada, G. (2015). Large scale mining and evidence combination to support medical diagnosis. Enschede: Centre for Telematics and Information Technology (CTIT). https://doi.org/10.3990/1.9789036538251
Berrada, Ghita. / Large scale mining and evidence combination to support medical diagnosis. Enschede : Centre for Telematics and Information Technology (CTIT), 2015. 147 p.
@phdthesis{70336b93124345a8b0343e4322ac453e,
title = "Large scale mining and evidence combination to support medical diagnosis",
abstract = "{"}Errare humanum est{"}. Clinicians, however smart, caring and meticulous they may be, are only human, all too human. So they are bound to occasionally make mistakes, diagnosis errors (i.e delayed/missed or wrong diagnosis) in particular. With a prevalence of misdiagnosis of 15{\%} in most areas of medicine, of which about 32{\%} are due to errors in clinician assessments, misdiagnosis is a major if overlooked problem which seems to have systemic roots rather than just being the problem a few isolated {"}bad apples{"}. Assigning blame to individual health practitioners for diagnosis failures will not only fail to solve the systemic issues but will also ensure that the same avoidable mistakes are repeated. To devise solutions that minimize the occurrence of misdiagnoses and/or their impact, it is necessary to understand the root systemic causes of misdiagnosis since, as stated in the landmark 2000 Institute of Medicine report on medical errors,“Errors can be prevented by designing systems that make it hard for people to do the wrong thing and easy for people to do the right thing‿. Misdiagnoses mostly occur because diagnosis is a decision-making process made under constraints that may conflict with the accuracy requirement. Such constraints include cost constraints, clinician time/availability and energy constraints, high data uncertainty as well the involvement of many clinicians/technicians in the process. Those constraints mean that clinicians have a hard time accessing all the patient data making it difficult for them to gather all the clues and evidence needed to reach a correct diagnosis and that clinicians have to rely on sometimes faulty cognitive shortcuts. So how can we use database/data mining knowledge to support clinicians in diagnosis process and obviate the risks posed by data fragmentation and cognitive shortcuts and biases? In response to this question, we designed a medical data sharing platform, using EEG (electroencephalogram) data as a representative example of medical data, to meet the following objectives: -sharing patient data and making it easily accessible -helping researchers help clinicians by providing a standard trove of data, to ensure (semi)-automated medical data interpretation methods are more easily comparable and reproducible, and by providing a data processing platform -making it easy to browse the data with similarity search requests (useful for differential diagnosis or diagnosis by comparison) -combining evidence at hand to provide a set of diagnosis hypotheses and their attached likelihood We showed that Hadoop would be a suitable medical data sharing and processing platform. We also proposed a Dempster-Shafer based framework to combine the pieces of evidence obtained during the diagnosis process so as to generate and/or update the list of possible diagnoses and their likelihoods throughout the process. Finally, we designed a feature-based similarity metric to perform similarity search on EEG data.",
keywords = "EWI-25666, EEG, evidence combination, Hadoop, IR-93887, METIS-308485, medical data sharing platform, Dempster-Shafer theory, similarity search, misdiagnosis",
author = "Ghita Berrada",
note = "75{\%} voor TNW-NIM (=MD&I) en 25{\%} voor EWI-DB",
year = "2015",
month = "1",
day = "16",
doi = "10.3990/1.9789036538251",
language = "English",
isbn = "978-90-365-3825-1",
publisher = "Centre for Telematics and Information Technology (CTIT)",
address = "Netherlands",
school = "University of Twente",

}

Large scale mining and evidence combination to support medical diagnosis. / Berrada, Ghita.

Enschede : Centre for Telematics and Information Technology (CTIT), 2015. 147 p.

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

TY - THES

T1 - Large scale mining and evidence combination to support medical diagnosis

AU - Berrada, Ghita

N1 - 75% voor TNW-NIM (=MD&I) en 25% voor EWI-DB

PY - 2015/1/16

Y1 - 2015/1/16

N2 - "Errare humanum est". Clinicians, however smart, caring and meticulous they may be, are only human, all too human. So they are bound to occasionally make mistakes, diagnosis errors (i.e delayed/missed or wrong diagnosis) in particular. With a prevalence of misdiagnosis of 15% in most areas of medicine, of which about 32% are due to errors in clinician assessments, misdiagnosis is a major if overlooked problem which seems to have systemic roots rather than just being the problem a few isolated "bad apples". Assigning blame to individual health practitioners for diagnosis failures will not only fail to solve the systemic issues but will also ensure that the same avoidable mistakes are repeated. To devise solutions that minimize the occurrence of misdiagnoses and/or their impact, it is necessary to understand the root systemic causes of misdiagnosis since, as stated in the landmark 2000 Institute of Medicine report on medical errors,“Errors can be prevented by designing systems that make it hard for people to do the wrong thing and easy for people to do the right thing‿. Misdiagnoses mostly occur because diagnosis is a decision-making process made under constraints that may conflict with the accuracy requirement. Such constraints include cost constraints, clinician time/availability and energy constraints, high data uncertainty as well the involvement of many clinicians/technicians in the process. Those constraints mean that clinicians have a hard time accessing all the patient data making it difficult for them to gather all the clues and evidence needed to reach a correct diagnosis and that clinicians have to rely on sometimes faulty cognitive shortcuts. So how can we use database/data mining knowledge to support clinicians in diagnosis process and obviate the risks posed by data fragmentation and cognitive shortcuts and biases? In response to this question, we designed a medical data sharing platform, using EEG (electroencephalogram) data as a representative example of medical data, to meet the following objectives: -sharing patient data and making it easily accessible -helping researchers help clinicians by providing a standard trove of data, to ensure (semi)-automated medical data interpretation methods are more easily comparable and reproducible, and by providing a data processing platform -making it easy to browse the data with similarity search requests (useful for differential diagnosis or diagnosis by comparison) -combining evidence at hand to provide a set of diagnosis hypotheses and their attached likelihood We showed that Hadoop would be a suitable medical data sharing and processing platform. We also proposed a Dempster-Shafer based framework to combine the pieces of evidence obtained during the diagnosis process so as to generate and/or update the list of possible diagnoses and their likelihoods throughout the process. Finally, we designed a feature-based similarity metric to perform similarity search on EEG data.

AB - "Errare humanum est". Clinicians, however smart, caring and meticulous they may be, are only human, all too human. So they are bound to occasionally make mistakes, diagnosis errors (i.e delayed/missed or wrong diagnosis) in particular. With a prevalence of misdiagnosis of 15% in most areas of medicine, of which about 32% are due to errors in clinician assessments, misdiagnosis is a major if overlooked problem which seems to have systemic roots rather than just being the problem a few isolated "bad apples". Assigning blame to individual health practitioners for diagnosis failures will not only fail to solve the systemic issues but will also ensure that the same avoidable mistakes are repeated. To devise solutions that minimize the occurrence of misdiagnoses and/or their impact, it is necessary to understand the root systemic causes of misdiagnosis since, as stated in the landmark 2000 Institute of Medicine report on medical errors,“Errors can be prevented by designing systems that make it hard for people to do the wrong thing and easy for people to do the right thing‿. Misdiagnoses mostly occur because diagnosis is a decision-making process made under constraints that may conflict with the accuracy requirement. Such constraints include cost constraints, clinician time/availability and energy constraints, high data uncertainty as well the involvement of many clinicians/technicians in the process. Those constraints mean that clinicians have a hard time accessing all the patient data making it difficult for them to gather all the clues and evidence needed to reach a correct diagnosis and that clinicians have to rely on sometimes faulty cognitive shortcuts. So how can we use database/data mining knowledge to support clinicians in diagnosis process and obviate the risks posed by data fragmentation and cognitive shortcuts and biases? In response to this question, we designed a medical data sharing platform, using EEG (electroencephalogram) data as a representative example of medical data, to meet the following objectives: -sharing patient data and making it easily accessible -helping researchers help clinicians by providing a standard trove of data, to ensure (semi)-automated medical data interpretation methods are more easily comparable and reproducible, and by providing a data processing platform -making it easy to browse the data with similarity search requests (useful for differential diagnosis or diagnosis by comparison) -combining evidence at hand to provide a set of diagnosis hypotheses and their attached likelihood We showed that Hadoop would be a suitable medical data sharing and processing platform. We also proposed a Dempster-Shafer based framework to combine the pieces of evidence obtained during the diagnosis process so as to generate and/or update the list of possible diagnoses and their likelihoods throughout the process. Finally, we designed a feature-based similarity metric to perform similarity search on EEG data.

KW - EWI-25666

KW - EEG

KW - evidence combination

KW - Hadoop

KW - IR-93887

KW - METIS-308485

KW - medical data sharing platform

KW - Dempster-Shafer theory

KW - similarity search

KW - misdiagnosis

U2 - 10.3990/1.9789036538251

DO - 10.3990/1.9789036538251

M3 - PhD Thesis - Research UT, graduation UT

SN - 978-90-365-3825-1

PB - Centre for Telematics and Information Technology (CTIT)

CY - Enschede

ER -

Berrada G. Large scale mining and evidence combination to support medical diagnosis. Enschede: Centre for Telematics and Information Technology (CTIT), 2015. 147 p. https://doi.org/10.3990/1.9789036538251