TransFusion: Multi-modal Fusion Network for Semantic Segmentation

Abhisek Maiti*, Sander Oude Elberink, George Vosselman

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)
8 Downloads (Pure)

Abstract

The complementary properties of 2D color images and 3D point clouds can potentially improve semantic segmentation compared to using uni-modal data. Multi-modal data fusion is however challenging due to the heterogeneity, dimensionality of the data, the difficulty of aligning different modalities to the same reference frame, and the presence of modality-specific bias. In this regard, we propose a new model, TransFusion, for semantic segmentation that fuses images directly with point clouds without the need for lossy pre-processing of the point clouds. TransFusion outperforms the baseline FCN model that uses images with depth maps. Compared to the baseline, our method improved mIoU by 4% and 2% for the Vaihingen and Potsdam datasets. We demonstrate the capability of our proposed model to adequately learn the spatial and structural information resulting in better inference.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023
PublisherIEEE
Pages6537-6547
Number of pages11
ISBN (Electronic)9798350302493
DOIs
Publication statusPublished - 2023
EventIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 - Vancouver, Canada
Duration: 18 Jun 202322 Jun 2023
https://cvpr2023.thecvf.com/

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Volume2023-June
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

ConferenceIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
Abbreviated titleCVPR 2023
Country/TerritoryCanada
CityVancouver
Period18/06/2322/06/23
Internet address

Keywords

  • 2023 OA procedure

Fingerprint

Dive into the research topics of 'TransFusion: Multi-modal Fusion Network for Semantic Segmentation'. Together they form a unique fingerprint.

Cite this