Introduction to multimodal scene understanding

Michael Ying Yang, Bodo Rosenhahn, Vittorio Murino

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

1 Citation (Scopus)

Abstract

A fundamental goal of computer vision is to discover the semantic information within a given scene, commonly referred to as scene understanding. The overall goal is to find a mapping to derive semantic information from sensor data, which is an extremely challenging task, partially due to the ambiguities in the appearance of the data. However, the majority of the scene understanding tasks tackled so far are mainly involving visual modalities only. In this book, we aim at providing an overview of recent advances in algorithms and applications that involve multiple sources of information for scene understanding. In this context, deep learning models are particularly suitable for combining multiple modalities and, as a matter of fact, many contributions are dealing with such architectures to take benefit of all data streams and obtain optimal performances. We conclude this book’s introduction by a concise description of the rest of the chapters therein contained. They are focused at providing an understanding of the state-of-the-art, open problems, and future directions related to multimodal scene understanding as a scientific discipline.

Original languageEnglish
Title of host publicationMultimodal Scene Understanding
Subtitle of host publicationAlgorithms, Applications and Deep Learning
EditorsMichael Ying Yang, Bodo Rosenhahn, Vittotio Murino
PublisherElsevier
Pages1-7
Number of pages7
ISBN (Electronic)9780128173589
DOIs
Publication statusPublished - 2 Aug 2019

Keywords

  • Computer vision
  • Deep learning
  • Multimodality
  • Scene understanding

Fingerprint Dive into the research topics of 'Introduction to multimodal scene understanding'. Together they form a unique fingerprint.

  • Cite this

    Yang, M. Y., Rosenhahn, B., & Murino, V. (2019). Introduction to multimodal scene understanding. In M. Y. Yang, B. Rosenhahn, & V. Murino (Eds.), Multimodal Scene Understanding: Algorithms, Applications and Deep Learning (pp. 1-7). Elsevier. https://doi.org/10.1016/B978-0-12-817358-9.00007-X