Building usage classification using a transformer-based multimodal deep learning method

Wen Zhou*, C. Persello, A. Stein

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)
39 Downloads (Pure)

Abstract

Building usage classification is of great significance for urban planning and city digital twinning applications. So far, however, the problem of mixed building use has not been addressed, and detailed categories cannot be assigned to individual buildings. This paper employs a state-of-the-art Transformer-based multimodal deep learning method to extract and fuse image features from satellite images with textual features of point-of-interest (POI) data. The derived features along with the relationship between the two types of data are utilized for the classification task. A custom dataset prepared for the city of Wuhan, China, with eight land-use categories has been classified yielding a microf1-score of 80.7%. Results show that the proposed method can effectively improve the classification results, achieving 5.6% higher accuracy as compared to the results based upon a single data source.

Original languageEnglish
Title of host publication2023 Joint Urban Remote Sensing Event, JURSE 2023
PublisherIEEE
Number of pages4
ISBN (Electronic)9781665493734
DOIs
Publication statusPublished - 8 Jun 2023
EventJoint Urban Remote Sensing Event, JURSE 2023 - Heraklion, Greece
Duration: 17 May 202319 May 2023
http://jurse2023.org/

Publication series

Name2023 Joint Urban Remote Sensing Event, JURSE 2023

Conference

ConferenceJoint Urban Remote Sensing Event, JURSE 2023
Abbreviated titleJURSE 2023
Country/TerritoryGreece
CityHeraklion
Period17/05/2319/05/23
Internet address

Keywords

  • feature fusion
  • Mixed-use classification
  • multimodal deep learning
  • natural language processing
  • 2023 OA procedure

Fingerprint

Dive into the research topics of 'Building usage classification using a transformer-based multimodal deep learning method'. Together they form a unique fingerprint.

Cite this