Building outline delineation: From aerial images to polygons with an improved end-to-end learning framework

Wufan Zhao, C. Persello, A. Stein

Research output: Contribution to journalArticleAcademicpeer-review

63 Citations (Scopus)
410 Downloads (Pure)


Deep learning methods based upon convolutional neural networks (CNNs) have demonstrated impressive performance in the task of building outline delineation from very high resolution (VHR) remote sensing (RS) imagery. In this paper, we introduce an improved method that is able to predict regularized building outline in a vector format within an end-to-end deep learning framework. The main idea of our framework is to learn to predict the location of key vertices of the buildings and connect them in sequence. The proposed method is based on PolyMapper. We upgrade the feature extraction by introducing global context and boundary refinement blocks and add channel and spatial attention modules to improve the effectiveness of the detection module. In addition, we introduce stacked conv-GRU to further preserve the geometric relationship between vertices and accelerate inference. We tested our method on two large-scale VHR-RS building extraction dataset. The results on both COCO and PoLiS metrics demonstrate better performance compared with Mask R-CNN and PolyMapper. Specifically, we achieve 4.2 mask mean average precision (mAP) and 3.7 mean average recall (mAR) absolute improvements compared to PolyMapper. Also, the qualitative comparison shows that our method significantly improves the instance segmentation of buildings of various shapes.

Original languageEnglish
Pages (from-to)119-131
Number of pages13
JournalISPRS journal of photogrammetry and remote sensing
Early online date16 Mar 2021
Publication statusPublished - May 2021


  • Building outline delineation
  • Convolutional neural networks
  • Optical remote sensing imagery
  • Polygon prediction
  • Recurrent neural networks
  • UT-Hybrid-D


Dive into the research topics of 'Building outline delineation: From aerial images to polygons with an improved end-to-end learning framework'. Together they form a unique fingerprint.

Cite this