Investigating Imputation Methods for Handling Missing Data

Jelle Maas, Job G.W.T. Römer, Işıl Baysal Erez, Maurice van Keulen

Research output: Contribution to conferencePaperpeer-review

33 Downloads (Pure)

Abstract

Missing data is a prevalent and pressing data quality issue hampering the usability of datasets. Straightforward methods, such as deleting rows with missing values, often do not suffice, especially if there are correlations between observed and missing values. Many data imputation methods have been proposed that use different ways of generat-
ing replacement values. Often, these methods have been evaluated on their ability to reconstruct correct original values after randomly deleting values from a dataset without missing values. Better reconstruction performance, however, does not necessarily mean better suitability for a subsequent analytical task such as classification. Moreover, the assumption of missing values to occur randomly is often incorrect. In this paper, 9 more advanced data imputation methods were compared on both the reconstruction performance as well as the performance of 11 machine learning methods for a subsequent classification task under varying conditions: 3 different data sets, different percentages of missing values, missing values in categorical and numerical attributes, and MCAR, MNAR,
and MAR mechanisms for missing values to occur. The experiments show that indeed although an imputation method may be worse in reconstruction performance, it may still be the best choice for the classification task
at hand. Furthermore, a data imputation method based on Random Forest or Multilayer Perceptron appear to be an overall good choice in most circumstances, while others present their own strengths and weaknesses.
Original languageEnglish
Number of pages22
Publication statusPublished - 9 Nov 2023
EventJoint International Scientific Conferences on AI and Machine Learning, BNAIC/BeNeLearn 2023 - Delft University of Technology, Delft, Netherlands
Duration: 8 Nov 202310 Nov 2023
https://bnaic2023.tudelft.nl/

Conference

ConferenceJoint International Scientific Conferences on AI and Machine Learning, BNAIC/BeNeLearn 2023
Abbreviated titleBNAIC/BeNeLearn 2023
Country/TerritoryNetherlands
CityDelft
Period8/11/2310/11/23
Internet address

Fingerprint

Dive into the research topics of 'Investigating Imputation Methods for Handling Missing Data'. Together they form a unique fingerprint.

Cite this