Hierarchical building use classification from multiple modalities with a multi-label multimodal transformer network

Wen Zhou*, Claudio Persello, Alfred Stein

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

1 Citation (Scopus)
186 Downloads (Pure)

Abstract

Building use information is important for urban planning, city digital twins, and informed policy formulation. Prior research has predominantly focused on mapping building use in broad categories, offering general insight into their actual use. Our study investigates the extraction of hierarchical building categories, encompassing both broad and detailed classifications while accounting for mixed-use. To achieve this, we explore the fusion of building function information from satellite images, digital surface models (DSM), street view images, and point of interest (POI) data. We propose a novel multi-label multimodal transformer-based feature fusion network, which is capable of simultaneously predicting four broad categories and 13 detailed categories. Experimental results demonstrate the efficacy of our method, as it maps most of the building use categories, with the weighted average F1 score for four broad categories and 13 detailed categories of 91% and 77%, respectively. Our experiments underscore the critical role of satellite images in building use classification, with the inclusion of DSM data and POI significantly enhancing the classification accuracy. By considering detailed use categories and accounting for mixed-use, our method provides more detailed insights into land use patterns, thereby contributing to urban planning and management.

Original languageEnglish
Article number104038
JournalInternational Journal of Applied Earth Observation and Geoinformation
Volume132
DOIs
Publication statusPublished - Aug 2024

Keywords

  • Building hierarchical use classification
  • Mixed-use
  • Multi-label classification
  • Multimodal integration
  • ITC-ISI-JOURNAL-ARTICLE
  • ITC-HYBRID
  • UT-Hybrid-D

Fingerprint

Dive into the research topics of 'Hierarchical building use classification from multiple modalities with a multi-label multimodal transformer network'. Together they form a unique fingerprint.

Cite this