TY - JOUR
T1 - Hierarchical building use classification from multiple modalities with a multi-label multimodal transformer network
AU - Zhou, Wen
AU - Persello, Claudio
AU - Stein, Alfred
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2024/8
Y1 - 2024/8
N2 - Building use information is important for urban planning, city digital twins, and informed policy formulation. Prior research has predominantly focused on mapping building use in broad categories, offering general insight into their actual use. Our study investigates the extraction of hierarchical building categories, encompassing both broad and detailed classifications while accounting for mixed-use. To achieve this, we explore the fusion of building function information from satellite images, digital surface models (DSM), street view images, and point of interest (POI) data. We propose a novel multi-label multimodal transformer-based feature fusion network, which is capable of simultaneously predicting four broad categories and 13 detailed categories. Experimental results demonstrate the efficacy of our method, as it maps most of the building use categories, with the weighted average F1 score for four broad categories and 13 detailed categories of 91% and 77%, respectively. Our experiments underscore the critical role of satellite images in building use classification, with the inclusion of DSM data and POI significantly enhancing the classification accuracy. By considering detailed use categories and accounting for mixed-use, our method provides more detailed insights into land use patterns, thereby contributing to urban planning and management.
AB - Building use information is important for urban planning, city digital twins, and informed policy formulation. Prior research has predominantly focused on mapping building use in broad categories, offering general insight into their actual use. Our study investigates the extraction of hierarchical building categories, encompassing both broad and detailed classifications while accounting for mixed-use. To achieve this, we explore the fusion of building function information from satellite images, digital surface models (DSM), street view images, and point of interest (POI) data. We propose a novel multi-label multimodal transformer-based feature fusion network, which is capable of simultaneously predicting four broad categories and 13 detailed categories. Experimental results demonstrate the efficacy of our method, as it maps most of the building use categories, with the weighted average F1 score for four broad categories and 13 detailed categories of 91% and 77%, respectively. Our experiments underscore the critical role of satellite images in building use classification, with the inclusion of DSM data and POI significantly enhancing the classification accuracy. By considering detailed use categories and accounting for mixed-use, our method provides more detailed insights into land use patterns, thereby contributing to urban planning and management.
KW - Building hierarchical use classification
KW - Mixed-use
KW - Multi-label classification
KW - Multimodal integration
KW - ITC-ISI-JOURNAL-ARTICLE
KW - ITC-HYBRID
KW - UT-Hybrid-D
U2 - 10.1016/j.jag.2024.104038
DO - 10.1016/j.jag.2024.104038
M3 - Article
AN - SCOPUS:85199096091
SN - 1569-8432
VL - 132
JO - International Journal of Applied Earth Observation and Geoinformation
JF - International Journal of Applied Earth Observation and Geoinformation
M1 - 104038
ER -