TY - JOUR
T1 - Improving GALDIT-based groundwater vulnerability predictive mapping using coupled resampling algorithms and machine learning models
AU - Barzegar, Rahim
AU - Razzagh, Siamak
AU - Quilty, John
AU - Adamowski, Jan
AU - Kheyrollah Pour, Homa
AU - Booij, Martijn J.
PY - 2021/7/1
Y1 - 2021/7/1
N2 - Developing accurate groundwater vulnerability maps is important for the sustainable management of groundwater resources. In this research, resampling methods [e.g., Bootstrap Aggregating (BA) and Disjoint Aggregating (DA)] are combined with machine learning (ML) models, namely eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), Adaptive Boosting (AdaBoost), Categorical Boosting (CatBoost), and Random Forest (RF), to improve the GALDIT groundwater vulnerability mapping framework that considers Groundwater occurrence (G) (i.e., aquifer type), Aquifer hydraulic conductivity (A), depth to groundwater Level (L), Distance from the seashore (D), Impact of existing seawater intrusion status (I), and aquifer Thickness (T). The proposed approach overcomes the subjectivity of the weights and ratings given to the six variables in the GALDIT framework (via the ML methods) and helps address the small dataset issue (via resampling methods) common to groundwater vulnerability predictive mapping. Considering the Shabestar Plain aquifer, situated in the northeast of Lake Urmia (Iran), the predicted vulnerability indices from GALDIT were adjusted using total dissolved solid (TDS, an indicator of drinking water quality) concentrations, and were modeled by the ML models. Pearson’s correlation coefficient (r) and distance correlation (DC) between the predicted vulnerability indices and TDS were used to validate the models. Using a validation set, the GALDIT framework (r = 0.447 and DC = 0.511) was compared against the best performing standalone (XGBoost-GALDIT, r = 0.613, DC = 0.647) and coupled resampling (BA-XGBoost-GALDIT, r = 0.659, DC = 0.699 and DA-RF-GALDIT, r = 0.616, DC = 0.662) ML models, revealing that the proposed framework significantly increases r and DC metrics. In general, the BA resampling method led to better performing ML models than DA. However, in all cases, it was found that integrating resampling methods and ML models are promising tools to improve the accuracy of GALDIT vulnerability models.
AB - Developing accurate groundwater vulnerability maps is important for the sustainable management of groundwater resources. In this research, resampling methods [e.g., Bootstrap Aggregating (BA) and Disjoint Aggregating (DA)] are combined with machine learning (ML) models, namely eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), Adaptive Boosting (AdaBoost), Categorical Boosting (CatBoost), and Random Forest (RF), to improve the GALDIT groundwater vulnerability mapping framework that considers Groundwater occurrence (G) (i.e., aquifer type), Aquifer hydraulic conductivity (A), depth to groundwater Level (L), Distance from the seashore (D), Impact of existing seawater intrusion status (I), and aquifer Thickness (T). The proposed approach overcomes the subjectivity of the weights and ratings given to the six variables in the GALDIT framework (via the ML methods) and helps address the small dataset issue (via resampling methods) common to groundwater vulnerability predictive mapping. Considering the Shabestar Plain aquifer, situated in the northeast of Lake Urmia (Iran), the predicted vulnerability indices from GALDIT were adjusted using total dissolved solid (TDS, an indicator of drinking water quality) concentrations, and were modeled by the ML models. Pearson’s correlation coefficient (r) and distance correlation (DC) between the predicted vulnerability indices and TDS were used to validate the models. Using a validation set, the GALDIT framework (r = 0.447 and DC = 0.511) was compared against the best performing standalone (XGBoost-GALDIT, r = 0.613, DC = 0.647) and coupled resampling (BA-XGBoost-GALDIT, r = 0.659, DC = 0.699 and DA-RF-GALDIT, r = 0.616, DC = 0.662) ML models, revealing that the proposed framework significantly increases r and DC metrics. In general, the BA resampling method led to better performing ML models than DA. However, in all cases, it was found that integrating resampling methods and ML models are promising tools to improve the accuracy of GALDIT vulnerability models.
KW - 2022 OA procedure
U2 - 10.1016/j.jhydrol.2021.126370
DO - 10.1016/j.jhydrol.2021.126370
M3 - Article
SN - 0022-1694
VL - 598
JO - Journal of hydrology
JF - Journal of hydrology
M1 - 126370
ER -