Continuous dimensional models of human affect, such as those based on valence and arousal, have been shown to be more accurate in describing a broad range of spontaneous, everyday emotions than the more traditional models of discrete stereotypical emotion categories (e.g. happiness, surprise). However, most prior work on estimating valence and arousal considered only laboratory settings and acted data. It is unclear whether the findings of these studies also hold when the methodologies proposed in these works are tested on data collected in-the-wild. In this paper we investigate this. We propose a new dataset of highly accurate per-frame annotations of valence and arousal for 600 challenging video clips extracted from feature films (also used in part for the AFEW dataset). For each video clip, we further provide per-frame annotations of 68 facial landmarks. We subsequently evaluate a number of common baseline and state-of-the-art methods on both a commonly used laboratory recording dataset (Semaine database) and the newly proposed recording set (AFEW-VA). Our results show that geometric features perform well independently of the settings. However, as expected, methods that perform well on constrained data do not necessarily generalise to uncontrolled data and vice-versa.
- Continuous affect estimation in-the-wild
- Dimensional affect recognition in-the-wild
- Dimensional emotion modelling
- Facial expressions