String-based audiovisual fusion of behavioural events for the assessment of dimensional affect

Florian Eyben, Martin Wöllmer, Michel F. Valstar, Hatice Gunes, Björn Schuller, Maja Pantic

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    40 Citations (Scopus)


    The automatic assessment of affect is mostly based on feature-level approaches, such as distances between facial points or prosodic and spectral information when it comes to audiovisual analysis. However, it is known and intuitive that behavioural events such as smiles, head shakes or laughter and sighs also bear highly relevant information regarding a subject's affective display. Accordingly, we propose a novel string-based prediction approach to fuse such events and to predict human affect in a continuous dimensional space. Extensive analysis and evaluation has been conducted using the newly released SEMAINE database of human-to-agent communication. For a thorough understanding of the obtained results, we provide additional benchmarks by more conventional feature-level modelling, and compare these and the string-based approach to fusion of signal-based features and string-based events. Our experimental results show that the proposed string-based approach is the best performing approach for automatic prediction of Valence and Expectation dimensions, and improves prediction performance for the other dimensions when combined with at least acoustic signal-based features.
    Original languageUndefined
    Title of host publicationIEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011)
    Place of PublicationUSA
    Number of pages8
    ISBN (Print)978-1-4244-9140-7
    Publication statusPublished - Mar 2011
    Event9th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2011 - Santa Barbara, United States
    Duration: 21 Mar 201125 Mar 2011
    Conference number: 9

    Publication series

    PublisherIEEE Computer Society


    Conference9th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2011
    Abbreviated titleFG
    Country/TerritoryUnited States
    CitySanta Barbara


    • METIS-285032
    • IR-79436
    • Databases
    • Visualization
    • Feature extraction
    • Hidden Markov models
    • Pixel
    • Speech
    • EWI-21332
    • EC Grant Agreement nr.: FP7/231287
    • EC Grant Agreement nr.: FP7/211486
    • Face

    Cite this