TY - JOUR
T1 - A two-point machine learning method for the spatial prediction of soil pollution
AU - Gao, Bingbo
AU - Stein, A.
AU - Wang, Jinfeng
N1 - Funding Information:
This work was funded by the National Key R&D Program of China through grant 2021YFE0102300 and the Open Research Fund of Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University.
Publisher Copyright:
© 2022 The Author(s)
PY - 2022/4
Y1 - 2022/4
N2 - Heavy metal soil pollution is a worldwide problem. It is affected by many natural and human factors through heterogeneous relationships. Accurate prediction at unobserved locations using a limited number of observations hence remains a challenge. This study proposes a two-point machine learning method to fully utilize the information in spatial neighbors and high-dimensional covariates to improve prediction accuracy. It models the difference between pairs of points, predicts concentration differences between observation points and unobserved points, and uses those for neighbor selection. This supervised learning method integrates both spatial autocorrelation and property similarity. Method performance, illustrated in a case study of soil Pb, confirms that our method can greatly improve prediction accuracy for different sample sizes. The improvements vary with the sample size and have a decreasing trend as the sample size increases. Compared with ordinary kriging, kriging with external drift, random forest, and random forest-based regression kriging, the average improvements on RMSE are 1.49, 0.95, 0.93 and 0.62 respectively, and on MAE are 1.29, 1.17, 0.87 and 0.65 respectively. In the future, the method may be applied to the spatial prediction of other variables of the earth system, while the supervised learning method can be adjusted to new applications.
AB - Heavy metal soil pollution is a worldwide problem. It is affected by many natural and human factors through heterogeneous relationships. Accurate prediction at unobserved locations using a limited number of observations hence remains a challenge. This study proposes a two-point machine learning method to fully utilize the information in spatial neighbors and high-dimensional covariates to improve prediction accuracy. It models the difference between pairs of points, predicts concentration differences between observation points and unobserved points, and uses those for neighbor selection. This supervised learning method integrates both spatial autocorrelation and property similarity. Method performance, illustrated in a case study of soil Pb, confirms that our method can greatly improve prediction accuracy for different sample sizes. The improvements vary with the sample size and have a decreasing trend as the sample size increases. Compared with ordinary kriging, kriging with external drift, random forest, and random forest-based regression kriging, the average improvements on RMSE are 1.49, 0.95, 0.93 and 0.62 respectively, and on MAE are 1.29, 1.17, 0.87 and 0.65 respectively. In the future, the method may be applied to the spatial prediction of other variables of the earth system, while the supervised learning method can be adjusted to new applications.
KW - Soil heavy metal
KW - Spatial heterogeneity
KW - Spatial prediction
KW - Two point machine learning
KW - ITC-ISI-JOURNAL-ARTICLE
KW - ITC-GOLD
UR - https://ezproxy2.utwente.nl/login?url=https://library.itc.utwente.nl/login/2022/isi/stein_two.pdf
U2 - 10.1016/j.jag.2022.102742
DO - 10.1016/j.jag.2022.102742
M3 - Article
AN - SCOPUS:85126366679
VL - 108
SP - 1
EP - 10
JO - International Journal of Applied Earth Observation and Geoinformation (JAG)
JF - International Journal of Applied Earth Observation and Geoinformation (JAG)
SN - 1569-8432
M1 - 102742
ER -