Abstract
Missing data is a prevalent problem in data science for many fields such as natural, social, and health sciences. Since most regression methods can not handle missing data directly, imputation methods are used in data pre-processing. Finding the best imputation method is non-trivial, however. Moreover, our results show that an independent choice for a best imputation method does not always result in the best predictive performance in the end; the combination matters. Furthermore, search-based approaches for finding a best-fitting imputer/regressor-pair
can be computationally intensive. In this paper, we propose the MetaLIRS (Meta Learning Imputation and Regression Selection) frame-work for developing resource-friendly ML-based recommendation models for method selection. With MetaLIRS, we constructed a proof-of-concept
recommendation model based on 12 meta-features that achieves an accuracy of 63% for selecting the best-fitting imputer/regressor-pair. A data scientist can use this model for a quick resource-friendly recommendation on which imputation and regression method to use for their particular
data set and task without the need for an expensive grid search among methods.
can be computationally intensive. In this paper, we propose the MetaLIRS (Meta Learning Imputation and Regression Selection) frame-work for developing resource-friendly ML-based recommendation models for method selection. With MetaLIRS, we constructed a proof-of-concept
recommendation model based on 12 meta-features that achieves an accuracy of 63% for selecting the best-fitting imputer/regressor-pair. A data scientist can use this model for a quick resource-friendly recommendation on which imputation and regression method to use for their particular
data set and task without the need for an expensive grid search among methods.
| Original language | English |
|---|---|
| Title of host publication | Intelligent Data Engineering and Automated Learning - IDEAL 2024 |
| Subtitle of host publication | 25th International Conference, Valencia, Spain, November 20-22, 2024. Proceedings, Part I |
| Editors | Vincente Julian, David Camacho, Hujun Yin, Juan M. Alberola, Vitor Beires Nogueira, Paulo Novais, Antonio Tallón-Ballesteros |
| Place of Publication | Cham, Switzerland |
| Publisher | Springer |
| Pages | 155-166 |
| Number of pages | 12 |
| ISBN (Electronic) | 978-3-031-77731-8 |
| ISBN (Print) | 978-3-031-77730-1 |
| DOIs | |
| Publication status | Published - 14 Nov 2024 |
| Event | 25th International Conference on Intelligent Data Engineering and Automated Learning - IDEAL 2024 - Valencia, Spain Duration: 20 Nov 2024 → 22 Nov 2024 Conference number: 25 |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Publisher | Springer |
| Volume | 15346 |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 25th International Conference on Intelligent Data Engineering and Automated Learning - IDEAL 2024 |
|---|---|
| Abbreviated title | IDEAL 2024 |
| Country/Territory | Spain |
| City | Valencia |
| Period | 20/11/24 → 22/11/24 |
Keywords
- 2024 OA procedure