Skip to main navigation Skip to search Skip to main content

Scale-wise Bidirectional Alignment Network for referring remote sensing image segmentation

Research output: Contribution to journalArticleAcademicpeer-review

76 Downloads (Pure)

Abstract

The goal of referring remote sensing image segmentation (RRSIS) is to extract specific pixel-level regions within an aerial image via a natural language expression. Recent advancements, particularly Transformer-based fusion designs, have demonstrated remarkable progress in this domain. However, existing methods primarily focus on refining visual features using language-aware guidance during the cross-modal fusion stage, neglecting the complementary vision-to-language flow. This limitation often leads to irrelevant or suboptimal representations. In addition, the diverse spatial scales of ground objects in aerial images pose significant challenges to the visual perception capabilities of existing models when conditioned on textual inputs. In this paper, we propose an innovative framework called Scale-wise Bidirectional Alignment Network (SBANet) to address these challenges for RRSIS. Specifically, we design a Bidirectional Alignment Module (BAM) with learnable query tokens to selectively and effectively represent visual and linguistic features, emphasizing regions associated with key tokens. BAM is further enhanced with a dynamic feature selection block to provide both macro- and micro-level visual features. This design preserves global context and local details, enabling more effective cross-modal interaction. Furthermore, SBANet incorporates a text-conditioned channel and spatial aggregator to bridge the gap between the encoder and decoder, enhancing cross-scale information exchange in complex aerial scenarios. Extensive experiments demonstrate that our proposed method achieves superior performance in comparison to previous state-of-the-art methods on the RRSIS-D and RefSegRS datasets, both quantitatively and qualitatively.
Original languageEnglish
Pages (from-to)350-363
Number of pages14
JournalISPRS journal of photogrammetry and remote sensing
Volume226
Early online date29 May 2025
DOIs
Publication statusPublished - Aug 2025

Keywords

  • ITC-ISI-JOURNAL-ARTICLE
  • ITC-HYBRID

Fingerprint

Dive into the research topics of 'Scale-wise Bidirectional Alignment Network for referring remote sensing image segmentation'. Together they form a unique fingerprint.

Cite this