Skip to main navigation Skip to search Skip to main content

The impact of different sampling strategies on landslide susceptibility assessment: an explainable hybrid BO-XGBoost model

  • Peng Wang
  • , Hongwei Deng*
  • *Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Landslide susceptibility assessment (LSA) is an effective method for landslide risk management, yet accurately delineating landslide boundaries is often challenging, particularly over large areas. To enhance the accuracy of LSA, researchers have employed various methods to expand landslide samples. However, the impact of different landslide sampling strategies on LSA has not been thoroughly explored. This study utilizes a Bayesian optimization (BO) algorithm to optimize the extreme gradient boosting (XGBoost) model, combined with SHapley Additive exPlanations (SHAP), to propose an interpretable hybrid machine learning approach for assessing the impact of four landslide sampling strategies on LSA. Focusing on the Three Gorges Reservoir Area (TGRA), we constructed four landslide datasets using point, square, circle, and polygon sampling methods, and selected 12 influencing factors through correlation analysis and multicollinearity test. These datasets were randomly split into training and testing sets in a 7:3 ratio and analyzed using various machine learning models. Results show that the BO-XGBoost model performed exceptionally well on the square, circle and polygon datasets, achieving an AUC of 0.991 and a prediction accuracy of 0.963 on the polygon dataset. Furthermore, cross-validation indicated that the XGBoost model on the polygon dataset achieved the highest mean AUC (0.984) with a standard deviation (SD) of 0.003. Frequency ratio (FR) analysis of different landslide susceptibility maps (LSMs) revealed that the FR values for high susceptibility areas in the polygon dataset were consistently above 1, highlighting its effectiveness in distinguishing landslide susceptibility levels. SHAP interpretation of the BO-XGBoost models across different datasets indicates that the polygon-based dataset more accurately captures the true characteristics of landslides, thereby providing more reliable samples for landslide susceptibility modeling. This study provides valuable insights for future LSAs, particularly in addressing the challenge of landslide sampling under limited inventory conditions.
Original languageEnglish
Article number440
JournalEarth science informatics
Volume18
Issue number3
Early online date7 Jun 2025
DOIs
Publication statusPublished - Sept 2025

Keywords

  • n/a OA procedure

Fingerprint

Dive into the research topics of 'The impact of different sampling strategies on landslide susceptibility assessment: an explainable hybrid BO-XGBoost model'. Together they form a unique fingerprint.

Cite this