Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection

Yanpeng Cao, Dayan Guan, Yulun Wu, Jiangxin Yang* (Corresponding Author), Yanlong Cao, Michael Ying Yang

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

33 Citations (Scopus)
35 Downloads (Pure)


Effective fusion of complementary information captured by multi-modal sensors (visible and infrared cameras) enables robust pedestrian detection under various surveillance situations (e.g., daytime and nighttime). In this paper, we present a novel box-level segmentation supervised learning framework for accurate and real-time multispectral pedestrian detection by incorporating features extracted in visible and infrared channels. Specifically, our method takes pairs of aligned visible and infrared images with easily obtained bounding box annotations as input and estimates accurate prediction maps to highlight the existence of pedestrians. It offers two major advantages over the existing anchor box based multispectral detection methods. Firstly, it overcomes the hyperparameter setting problem occurred during the training phase of anchor box based detectors and can obtain more accurate detection results, especially for small and occluded pedestrian instances. Secondly, it is capable of generating accurate detection results using small-size input images, leading to improvement of computational efficiency for real-time autonomous driving applications. Experimental results on KAIST multispectral dataset show that our proposed method outperforms state-of-the-art approaches in terms of both accuracy and speed.
Original languageEnglish
Pages (from-to)70-79
Number of pages10
JournalISPRS journal of photogrammetry and remote sensing
Issue numberApril
Publication statusPublished - Apr 2019


  • Multispectral data
  • Pedestrian detection
  • Deep neural networks
  • Box-level segmentation
  • Real-time application

Cite this