AST: Adaptive Self-supervised Transformer for optical remote sensing representation

Qibin He*, Xian Sun*, Zhiyuan Yan, Bing Wang, Zicong Zhu, Wenhui Diao, Michael Ying Yang

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

15 Citations (Scopus)
42 Downloads (Pure)

Abstract

Due to the variation in spatial resolution and the diversity of object scales, the interpretation of optical remote sensing images is extremely challenging. Deep learning has become the mainstream solution to interpret such complex scenes. However, the explosion of deep learning model architectures has resulted in the need for hundreds of millions of remote sensing images for which labels are very costly or often unavailable publicly. This paper provides an in-depth analysis of the main reasons for this data thirst, i.e., (i) limited representational power for model learning, and (ii) underutilization of unlabeled remote sensing data. To overcome the above difficulties, we present a scalable and adaptive self-supervised Transformer (AST) for optical remote sensing image interpretation. By performing masked image modeling in pre-training, the proposed AST releases the rich supervision signals in massive unlabeled remote sensing data and learns useful multi-scale semantics. Specifically, a cross-scale Transformer architecture is designed to collaboratively learn global dependencies and local details by introducing a pyramid structure, to facilitate multi-granular feature interactions and generate scale-invariant representations. Furthermore, a masking token strategy relying on correlation mapping is proposed to achieve adaptive masking of partial patches without affecting key structures, which enhances the understanding of visually important regions. Extensive experiments on various optical remote sensing interpretation tasks show that AST has good generalization capability and competitiveness.

Original languageEnglish
Pages (from-to)41-54
Number of pages14
JournalISPRS journal of photogrammetry and remote sensing
Volume200
Early online date5 May 2023
DOIs
Publication statusPublished - Jun 2023

Keywords

  • 2024 OA procedure
  • Interpretation
  • Masked image modeling
  • Optical remote sensing
  • Representation learning
  • Cross-scale transformer
  • ITC-ISI-JOURNAL-ARTICLE

Fingerprint

Dive into the research topics of 'AST: Adaptive Self-supervised Transformer for optical remote sensing representation'. Together they form a unique fingerprint.

Cite this