Semantic segmentation using convolutional neural networks (CNNs) achieves higher accuracy than traditional methods, but it fails to yield satisfactory results under illumination variants when the training set is limited. In this paper we present a new data set containing both real and rendered images and a novel cascade network to study semantic segmentation in low-light indoor environments. Specifically, the network decomposes a low-light image into illumination and reflectance components, and then a multi-tasking learning scheme is built. One branch learns to reduce noise and restore information on the reflectance (reflectance restoration branch). Another branch learns to segment the reflectance map (semantic segmentation branch). The CNN features from two tasks are concatenated together so as to improve the segmentation accuracy by embedding the illumination-invariant features. We compare our approach with other CNN-based segmentation frameworks, including the state-of-the-art DeepLab v3+, on the proposed real data set, and our approach achieves the highest mIoU (47.6%). The experimental results also show that the semantic information supports the restoration of a sharper reflectance map, thus further improving the segmentation. Besides, we pre-train a model with the proposed large-scale rendered images and then fine-tune it on the real images. The pre-training results in an improvement of mIoU by 7.2%. Our models and data set are publicly available for research. This research is part of the EU project INGENIOUS. Our data sets and models are available on our website.
|Number of pages||12|
|Journal||ISPRS journal of photogrammetry and remote sensing|
|Early online date||7 Dec 2021|
|Publication status||Published - Jan 2022|