The growing number of narrow spectral bands in hyperspectral remote sensing improves the capacity to describe and predict biological processes in ecosystems. But it also poses a challenge to fit empirical models based on such high dimensional data, which often contain correlated and noisy predictors. As sample sizes, to train and validate empirical models, seem not to be increasing at the same rate, overfitting has become a serious concern. Overly complex models lead to overfitting by capturing more than the underlying relationship, and also through fitting random noise in the data. Many regression techniques claim to overcome these problems by using different strategies to constrain complexity, such as limiting the number of terms in the model, by creating latent variables or by shrinking parameter coefficients. This paper is proposing a new method, named Naïve Overfitting Index Selection (NOIS), which makes use of artificially generated spectra, to quantify the relative model overfitting and to select an optimal model complexity supported by the data. The robustness of this new method is assessed by comparing it to a traditional model selection based on cross-validation. The optimal model complexity is determined for seven different regression techniques, such as partial least squares regression, support vector machine, artificial neural network and tree-based regressions using five hyperspectral datasets. The NOIS method selects less complex models, which present accuracies similar to the cross-validation method. The NOIS method reduces the chance of overfitting, thereby avoiding models that present accurate predictions that are only valid for the data used, and too complex to make inferences about the underlying process. Published by: International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS), 2017.
Hyperspectral remote sensing and ecological trait, Earth sciences