Building use and mixed-use classification with a transformer-based network fusing satellite images and geospatial textual information

Wen Zhou*, C. Persello, Mengmeng Li, A. Stein

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

4 Citations (Scopus)
67 Downloads (Pure)


Assigning detailed use categories to buildings is a challenging and relevant task in urban land use classification with applications in urban planning, digital city modelling and twinning. This study aims to provide the categorisation of buildings with detailed use information by considering the possibilities of mixed-use. Mixed-use combines different use forms, and serves as a new type of use category. We obtain attributive information by combining satellite imagery that reflects spatial information and textual information from publicly available point-of-interest data collected by citizens and available on online maps. We propose a multimodal transformer-based building-use classification method to capture and fuse these different data sources within an end-to-end learning workflow. We evaluate the effectiveness of our proposed method on four urban areas in China. Experiments show that the proposed method effectively maps building use according to eight types of fine-grain categories, with a Micro F1 score equal to 80.9%, and a Macro F1 score equal to 62% for Wuhan research area. The proposed method is able to harness the relationship between the features obtained from the different data sources and results in higher accuracy than the state-of-the-art fusion-based multimodal integration methods. The proposed method can effectively increase the attributive grain of building use resulting in high classification accuracy.

Original languageEnglish
Article number113767
Number of pages21
JournalRemote sensing of environment
Early online date24 Aug 2023
Publication statusPublished - 1 Nov 2023


  • Building use classification
  • Data fusion
  • Mixed-use classification
  • Multimodal deep learning
  • Natural language processing
  • Remote sensing
  • Transformers
  • UT-Hybrid-D


Dive into the research topics of 'Building use and mixed-use classification with a transformer-based network fusing satellite images and geospatial textual information'. Together they form a unique fingerprint.

Cite this