Unmanned Aerial Vehicles (UAVs) for 3D indoor mapping applications are often equipped with bulky and expensive sensors, such as LIDAR (Light Detection and Ranging) or depth cameras. The same task could be also performed by inexpensive RGB cameras installed on light and small platforms that are more agile to move in confined spaces, such as during emergencies. However, this task is still challenging because of the absence of a GNSS (Global Navigation Satellite System) signal that limits the localization (and scaling) of the UAV. The reduced density of points in feature-based monocular SLAM (Simultaneous Localization and Mapping) then limits the completeness of the delivered maps. In this paper, the real-time capabilities of a commercial, inexpensive UAV (DJI Tello) for indoor mapping are investigated. The work aims to assess its suitability for quick mapping in emergency conditions to support First Responders (FR) during rescue operations in collapsed buildings. The proposed solution only uses images in input and integrates SLAM and CNN-based (Convolutional Neural Networks) Single Image Depth Estimation (SIDE) algorithms to densify and scale the data and to deliver a map of the environment suitable for real-time exploration. The implemented algorithms, the training strategy of the network, and the first tests on the main elements of the proposed methodology are reported in detail. The results achieved in real indoor environments are also presented, demonstrating performances that are compatible with FRs’ requirements to explore indoor volumes before entering the building.