TY - JOUR
T1 - CMGFNet
T2 - A deep cross-modal gated fusion network for building extraction from very high-resolution remote sensing images
AU - Hosseinpour, Hamidreza
AU - Samadzadegan, Farhad
AU - Dadrass Javan, F.
N1 - Funding Information:
The Potsdam and Vaihingen datasets are provided by the German Society for Photogrammetry and Remote Sensing. The dataset is published by United States Geological Survey.
Publisher Copyright:
© 2021 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS)
PY - 2022/2/1
Y1 - 2022/2/1
N2 - The extraction of urban structures such as buildings from very high-resolution (VHR) remote sensing imagery has improved dramatically, thanks to recent developments in deep multimodal fusion models. However, Due to the variety of colour intensities with complex textures of building objects in VHR images and the low quality of the digital surface model (DSM), it is challenging to develop the optimal cross-modal fusion network that takes advantage of these two modalities. This research presents an end-to-end cross-modal gated fusion network (CMGFNet) for extracting building footprints from VHR remote sensing images and DSMs data. The CMGFNet extracts multi-level features from RGB and DSM data by using two separate encoders. We offer two methods for fusing features in two modalities: Cross-modal and multi-level feature fusion. For cross-modal feature fusion, a gated fusion module (GFM) is proposed to combine two modalities efficiently. The multi-level feature fusion fuses the high-level features from deep layers with shallower low-level features through a top-down strategy. Furthermore, a residual-like depth-wise separable convolution (R-DSC) is introduced to enhance the performance of the up-sampling process and decrease the parameters and time complexity in the decoder section. Experimental results from challenging datasets show that the CMGFNet outperforms other state-of-the-art models. The efficacy of all significant elements is also confirmed by the extensive ablation study.
AB - The extraction of urban structures such as buildings from very high-resolution (VHR) remote sensing imagery has improved dramatically, thanks to recent developments in deep multimodal fusion models. However, Due to the variety of colour intensities with complex textures of building objects in VHR images and the low quality of the digital surface model (DSM), it is challenging to develop the optimal cross-modal fusion network that takes advantage of these two modalities. This research presents an end-to-end cross-modal gated fusion network (CMGFNet) for extracting building footprints from VHR remote sensing images and DSMs data. The CMGFNet extracts multi-level features from RGB and DSM data by using two separate encoders. We offer two methods for fusing features in two modalities: Cross-modal and multi-level feature fusion. For cross-modal feature fusion, a gated fusion module (GFM) is proposed to combine two modalities efficiently. The multi-level feature fusion fuses the high-level features from deep layers with shallower low-level features through a top-down strategy. Furthermore, a residual-like depth-wise separable convolution (R-DSC) is introduced to enhance the performance of the up-sampling process and decrease the parameters and time complexity in the decoder section. Experimental results from challenging datasets show that the CMGFNet outperforms other state-of-the-art models. The efficacy of all significant elements is also confirmed by the extensive ablation study.
KW - Building extraction
KW - Cross-modal
KW - Digital surface model
KW - Gated fusion module
KW - VHR remote sensing image
KW - ITC-ISI-JOURNAL-ARTICLE
U2 - 10.1016/j.isprsjprs.2021.12.007
DO - 10.1016/j.isprsjprs.2021.12.007
M3 - Article
AN - SCOPUS:85121925899
SN - 0924-2716
VL - 184
SP - 96
EP - 115
JO - ISPRS journal of photogrammetry and remote sensing
JF - ISPRS journal of photogrammetry and remote sensing
ER -