Spatial+: A new cross-validation method to evaluate geospatial machine learning models

Yanwen Wang*, Mahdi Khodadadzadeh, Raúl Zurita-Milla

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

22 Citations (Scopus)
223 Downloads (Pure)

Abstract

Random cross-validation (CV) is often used to evaluate geospatial machine learning models, particularly when a limited amount of sample data are available, and collecting an extra test set is unfeasible. However, the prediction locations can be substantially different from the available sample, leading to over-optimistic evaluation results. This has fostered the development of spatial CV methods. Yet these methods only focus on spatial autocorrelation and cannot sufficiently guarantee that the validation subset is a good proxy of the test set with significant differences. In this paper, we propose the spatial+ cross-validation (SP-CV) method. This method, which considers both the geographic and feature spaces, is composed of two stages. The first stage addresses spatial autocorrelation issues by using agglomerative hierarchical clustering to divide the available sample into blocks. The second stage deals with multiple sources of differences. It uses cluster ensembles to split the blocks into training and validation folds based on the locations of the sample data and the values of the covariates and target variable. The proposed method is compared against random and block CV methods in a series of experiments with Amazon basin above ground biomass and California houseprice datasets. Our results show that SP-CV provided the smallest error differences with respect to the reference error. This means that SP-CV produced more representative splits and led to more reliable model evaluations. It suggests that a reliable model evaluation requires to consider both the geographic and the feature spaces in a comprehensive manner.

Original languageEnglish
Article number103364
JournalInternational Journal of Applied Earth Observation and Geoinformation
Volume121
DOIs
Publication statusPublished - Jul 2023

Keywords

  • Cross-validation
  • Data-driven models
  • Feature space
  • Model evaluation
  • Spatial autocorrelation
  • ITC-GOLD
  • ITC-ISI-JOURNAL-ARTICLE
  • UT-Gold-D

Fingerprint

Dive into the research topics of 'Spatial+: A new cross-validation method to evaluate geospatial machine learning models'. Together they form a unique fingerprint.

Cite this