Bimodal Log-linear Regression for Fusion of Audio and Visual Features

Ognjen Rudovic, Stavros Petridis, Maja Pantic

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    9 Citations (Scopus)
    17 Downloads (Pure)

    Abstract

    One of the most commonly used audiovisual fusion approaches is feature-level fusion where the audio and visual features are concatenated. Although this approach has been successfully used in several applications, it does not take into account interactions between the features, which can be a problem when one and/or both modalities have noisy features. In this paper, we investigate whether feature fusion based on explicit modelling of interactions between audio and visual features can enhance the performance of the classifier that performs feature fusion using simple concatenation of the audio-visual features. To this end, we propose a log-linear model, named Bimodal Log-linear regression, which accounts for interactions between the features of the two modalities. The performance of the target classifiers is measured in the task of laughter-vs-speech discrimination, since both laughter and speech are naturally audiovisual events. Our experiments on the MAHNOB laughter database suggest that feature fusion based on explicit modelling of interactions between the audio-visual features leads to an improvement of 3% over the standard feature concatenation approach, when log-linear model is used as the base classifier. Finally, the most and least influential features can be easily identified by observing their interactions.
    Original languageUndefined
    Title of host publicationProceedings of the 21st ACM international conference on Multimedia, MM 2013
    Place of PublicationNew York
    PublisherAssociation for Computing Machinery (ACM)
    Pages789-792
    Number of pages4
    ISBN (Print)978-1-4503-2404-5
    DOIs
    Publication statusPublished - 21 Oct 2013
    Event21st ACM Multimedia Conference, MM 2013 - Barcelona, Spain
    Duration: 21 Oct 201325 Oct 2013
    Conference number: 21
    http://acmmm13.org/general-info/about-acm-multimedia-2013/

    Publication series

    Name
    PublisherACM

    Conference

    Conference21st ACM Multimedia Conference, MM 2013
    Abbreviated titleMM
    CountrySpain
    CityBarcelona
    Period21/10/1325/10/13
    Internet address

    Keywords

    • HMI-HF: Human Factors
    • EC Grant Agreement nr.: FP7/288235
    • EC Grant Agreement nr.: FP7/2007-2013
    • METIS-302619
    • EWI-24260
    • IR-89328
    • EC Grant Agreement nr.: ERC-2007-STG-203143 (MAHNOB)

    Cite this

    Rudovic, O., Petridis, S., & Pantic, M. (2013). Bimodal Log-linear Regression for Fusion of Audio and Visual Features. In Proceedings of the 21st ACM international conference on Multimedia, MM 2013 (pp. 789-792). New York: Association for Computing Machinery (ACM). https://doi.org/10.1145/2502081.2502207