Building polygon extraction from aerial images and digital surface models with a frame field learning framework

Xiaoyu Sun, Wufan Zhao, Raian V. Maretto, Claudio Persello*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

4 Citations (Scopus)
346 Downloads (Pure)


Deep learning-based models for building delineation from remotely sensed images face the challenge of producing precise and regular building outlines. This study investigates the combination of normalized digital surface models (nDSMs) with aerial images to optimize the extraction of building polygons using the frame field learning method. Results are evaluated at pixel, object, and polygon levels. In addition, an analysis is performed to assess the statistical deviations in the number of vertices of building polygons compared with the reference. The comparison of the number of vertices focuses on finding the output polygons that are the easiest to edit by human analysts in operational applications. It can serve as guidance to reduce the post-processing workload for obtaining high-accuracy building footprints. Experiments conducted in Enschede, the Netherlands, demonstrate that by introducing nDSM, the method could reduce the number of false positives and prevent missing the real buildings on the ground. The positional accuracy and shape similarity was improved, resulting in better-aligned building polygons. The method achieved a mean intersection over union (IoU) of 0.80 with the fused data (RGB + nDSM) against an IoU of 0.57 with the baseline (using RGB only) in the same area. A qualitative analysis of the results shows that the investigated model predicts more precise and regular polygons for large and complex structures.
Original languageEnglish
Article number4700
Pages (from-to)1-21
Number of pages21
JournalRemote sensing
Issue number22
Publication statusPublished - 20 Nov 2021




Dive into the research topics of 'Building polygon extraction from aerial images and digital surface models with a frame field learning framework'. Together they form a unique fingerprint.

Cite this