Audiovisual vocal outburst classification in noisy conditions

Florian Eyben, Stavros Petridis, Björn Schuller, Maja Pantic

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    5 Citations (Scopus)
    49 Downloads (Pure)

    Abstract

    In this study, we investigate an audiovisual approach for classification of vocal outbursts (non-linguistic vocalisations) in noisy conditions using Long Short-Term Memory (LSTM) Recurrent Neural Networks and Support Vector Machines. Fusion of geometric shape features and acoustic low-level descriptors is performed on the feature level. Three different types of acoustic noise are considered: babble, office and street noise. Experiments are conducted on every noise type to asses the benefit of the fusion in each case. As database for evaluations serves the INTERSPEECH 2010 Paralinguistic Challenge’s Audiovisual Interest Corpus of human-to-human natural conversation. The results show that even when training is performed on noise corrupted audio which matches the test conditions the addition of visual features is still beneficial.
    Original languageUndefined
    Title of host publicationProceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012
    Place of PublicationUSA
    PublisherIEEE Computer Society
    Pages5097-5100
    Number of pages4
    ISBN (Print)978-1-4673-0045-2
    DOIs
    Publication statusPublished - 25 Mar 2012
    EventIEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012 - Kyoto, Japan
    Duration: 25 Mar 201230 Mar 2012

    Publication series

    Name
    PublisherIEEE Computer Society
    ISSN (Print)1520-6149

    Conference

    ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012
    Abbreviated titleICASSP
    CountryJapan
    CityKyoto
    Period25/03/1230/03/12

    Keywords

    • EWI-23055
    • METIS-296292
    • IR-84320
    • HMI-MI: MULTIMODAL INTERACTIONS

    Cite this

    Eyben, F., Petridis, S., Schuller, B., & Pantic, M. (2012). Audiovisual vocal outburst classification in noisy conditions. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012 (pp. 5097-5100). USA: IEEE Computer Society. https://doi.org/10.1109/ICASSP.2012.6289067