Abstract
The ability to automatically detect the extent of agreement or disagreement a person expresses is an important indicator of inter-personal relations and emotion expression. Most of existing methods for automated analysis of human agreement from audio-visual data perform agreement detection using either audio or visual modality of human interactions. However, this is suboptimal as expression of different agreement levels is composed of various facial and vocal cues specific to the target level. To this end, we propose the first approach for multi-modal estimation of agreement intensity levels. Specifically, our model leverages the feature representation power of Multimodal Neural Networks (NN) and discriminative power of Conditional Ordinal Random Fields (CORF) to achieve dynamic classification of agreement levels from videos. We show on the MAHNOB-Mimicry database of dyadic human interactions that the proposed approach outperforms its uni-modal and linear counterparts, and related models that can be applied to the target task.
Original language | English |
---|---|
Title of host publication | 2016 23rd International Conference on Pattern Recognition (ICPR) |
Place of Publication | USA |
Publisher | IEEE |
Pages | 2228-2233 |
Number of pages | 6 |
ISBN (Electronic) | 978-1-5090-4847-2 |
ISBN (Print) | 978-1-5090-4848-9 |
DOIs | |
Publication status | Published - Apr 2017 |
Event | 23rd International Conference on Pattern Recognition 2016 - Cancun, Mexico Duration: 4 Dec 2016 → 8 Dec 2016 Conference number: 23 http://www.icpr2016.org/site/ |
Conference
Conference | 23rd International Conference on Pattern Recognition 2016 |
---|---|
Abbreviated title | ICPR 2016 |
Country/Territory | Mexico |
City | Cancun |
Period | 4/12/16 → 8/12/16 |
Internet address |
Keywords
- Automatic speech and speaker recognition
- Affective Computing
- EWI-27585
- HMI-HF: Human Factors