Abstract
The problem of automatically estimating the interest level of a subject has been gaining attention by researchers, mostly due to the vast applicability of interest detection. In this work, we obtain a set of continuous interest annotations for the SE-MAINE database, which we analyse also in terms of emotion dimensions such as valence and arousal. Most importantly, we propose a robust variant of Canonical Correlation Analysis (RCCA) for performing audio-visual fusion, which we apply to the prediction of interest. RCCA recovers a low-rank subspace which captures the correlations of fused modalities, while isolating gross errors in the data without making any assumptions regarding Gaussianity. We experimentally show that RCCA is more appropriate than other standard fusion techniques (such as l2-CCA and feature-level fusion), since it both captures interactions between modalities while also decontaminating the obtained subspace from errors which are dominant in real-world problems.
Original language | Undefined |
---|---|
Title of host publication | Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014 |
Place of Publication | USA |
Publisher | IEEE Computer Society |
Pages | 1522-1526 |
Number of pages | 5 |
ISBN (Print) | 978-1-4799-2892-7 |
DOIs | |
Publication status | Published - May 2014 |
Event | IEEE International Conference on Acoustic, Speech and Signal Processing, ICASSP 2014 - Fortezza dal Basso, Florence, Italy Duration: 4 May 2014 → 9 May 2014 http://www.icassp2014.org/home.html |
Publication series
Name | |
---|---|
Publisher | IEEE Computer Society |
Conference
Conference | IEEE International Conference on Acoustic, Speech and Signal Processing, ICASSP 2014 |
---|---|
Abbreviated title | ICASSP |
Country/Territory | Italy |
City | Florence |
Period | 4/05/14 → 9/05/14 |
Internet address |
Keywords
- HMI-HF: Human Factors
- EWI-25821
- EC Grant Agreement nr.: FP7/2007-2013
- EC Grant Agreement nr.: FP7/288235
- METIS-309947
- Interest Detection
- Emotion Recognition
- IR-95228
- Audio-visual Fusion
- Multi-modal Fusion