Deep learning for monocular depth estimation from UAV images

L. Madhuanand*, F. Nex, M. Y. Yang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

5 Citations (Scopus)
394 Downloads (Pure)


Depth is an essential component for various scene understanding tasks and for reconstructing the 3D geometry of the scene. Estimating depth from stereo images requires multiple views of the same scene to be captured which is often not possible when exploring new environments with a UAV. To overcome this monocular depth estimation has been a topic of interest with the recent advancements in computer vision and deep learning techniques. This research has been widely focused on indoor scenes or outdoor scenes captured at ground level. Single image depth estimation from aerial images has been limited due to additional complexities arising from increased camera distance, wider area coverage with lots of occlusions. A new aerial image dataset is prepared specifically for this purpose combining Unmanned Aerial Vehicles (UAV) images covering different regions, features and point of views. The single image depth estimation is based on image reconstruction techniques which uses stereo images for learning to estimate depth from single images. Among the various available models for ground-level single image depth estimation, two models, 1) a Convolutional Neural Network (CNN) and 2) a Generative Adversarial model (GAN) are used to learn depth from aerial images from UAVs. These models generate pixel-wise disparity images which could be converted into depth information. The generated disparity maps from these models are evaluated for its internal quality using various error metrics. The results show higher disparity ranges with smoother images generated by CNN model and sharper images with lesser disparity range generated by GAN model. The produced disparity images are converted to depth information and compared with point clouds obtained using Pix4D. It is found that the CNN model performs better than GAN and produces depth similar to that of Pix4D. This comparison helps in streamlining the efforts to produce depth from a single aerial image.
Original languageEnglish
Title of host publicationISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Subtitle of host publicationXXIV ISPRS Congress
EditorsN. Paparoditis, C. Mallet, F. Lafarge, F. Remondino, I. Toschi, T. Fuse
PublisherInternational Society for Photogrammetry and Remote Sensing (ISPRS)
Number of pages8
Publication statusPublished - 3 Aug 2020
EventXXIVth ISPRS Congress 2020 - Virtual Event, Nice, France
Duration: 4 Jul 202010 Jul 2020
Conference number: 24

Publication series

NameISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
ISSN (Print)2194-9042


ConferenceXXIVth ISPRS Congress 2020
Abbreviated titleISPRS 2020
Internet address


Dive into the research topics of 'Deep learning for monocular depth estimation from UAV images'. Together they form a unique fingerprint.

Cite this