Fusion of audio and visual cues for laughter detection

Stavros Petridis, Maja Pantic

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    18 Citations (Scopus)

    Abstract

    Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audio- visual approach to distinguishing laughter from speech and we show that integrating the information from audio and video channels leads to improved performance over single-modal approaches. Each channel consists of 2 streams (cues), facial expressions and head movements for video and spectral and prosodic features for audio. We used decision level fusion to integrate the information from the two channels and experimented using the SUM rule and a neural net- work as the integration functions. The results indicate that even a simple linear function such as the SUM rule achieves very good performance in audiovisual fusion. We also experimented with different combinations of cues with the most informative being the facial expressions and the spectral features. The best combination of cues is the integration of facial expressions, spectral and prosodic features when a neural network is used as the fusion method. When tested on 96 audiovisual sequences, depicting spontaneously displayed (as opposed to posed) laughter and speech episodes, in a person independent way the proposed audiovisual approach achieves over 90% recall rate and over 80% precision.
    Original languageUndefined
    Title of host publicationProceedings of the 2008 International Conference on Content-Based Image and Video Retrieval (CIVR'08)
    Place of PublicationNew York
    PublisherAssociation for Computing Machinery (ACM)
    Pages329-337
    Number of pages9
    ISBN (Print)978-1-60558-070-8
    DOIs
    Publication statusPublished - Jul 2008
    Event2008 International Conference on Content-Based Image and Video Retrieval (CIVR'08) - Niagara Falls, Canada
    Duration: 7 Jul 20089 Jul 2008

    Publication series

    Name
    PublisherACM
    Number2008/16200

    Conference

    Conference2008 International Conference on Content-Based Image and Video Retrieval (CIVR'08)
    Period7/07/089/07/08
    Other7-9 July 2008

    Keywords

    • EC Grant Agreement nr.: FP7/211486
    • EWI-14810
    • HMI-MI: MULTIMODAL INTERACTIONS
    • Audiovisual data processing
    • METIS-255087
    • laughter detection
    • EC Grant Agreement nr.: FP6/0027787
    • IR-62669
    • Nonlinguistic Information Processing

    Cite this