Deep Temporal Models using Identity Skip-Connections for Speech Emotion Recognition

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    11 Citations (Scopus)

    Abstract

    Deep architectures using identity skip-connections have demonstrated groundbreaking performance in the field of image classification. Recently, empirical studies suggested that identity skip-connections enable ensemble-like behaviour of shallow networks, and that depth is not a solo ingredient for their success. Therefore, we examine the potential of identity skip-connections for the task of Speech Emotion Recognition (SER) where moderately deep temporal architectures are often employed. To this end, we propose a novel architecture which regulates unimpeded feature flows and captures long-term dependencies via gate-based skip-connections and a memory mechanism. Our proposed architecture is compared to other state-of-the-art methods of SER and is evaluated on large aggregated corpora recorded in different contexts. Our proposed architecture outperforms the state-of-the-art methods by 9 - 15% and achieves an Unweighted Accuracy of 80.5% in an imbalanced class distribution. In addition, we examine a variant adopting simplified skip-connections of Residual Networks (ResNet) and show that gate-based skip-connections are more effective than simplified skip-connections.
    Original languageEnglish
    Title of host publicationMM '17
    Subtitle of host publicationProceedings of the 2017 ACM on Multimedia Conference
    PublisherAssociation for Computing Machinery (ACM)
    Pages1006-1013
    Number of pages8
    ISBN (Electronic)978-1-4503-4906-2
    DOIs
    Publication statusPublished - 2017
    Event25th ACM Multimedia Conference, MM 2017 - Mountain View, United States
    Duration: 23 Oct 201727 Oct 2017
    Conference number: 25
    http://www.acmmm.org/2017/

    Conference

    Conference25th ACM Multimedia Conference, MM 2017
    Abbreviated titleMM
    CountryUnited States
    CityMountain View
    Period23/10/1727/10/17
    Internet address

    Keywords

    • speech emotion detection
    • deep learning
    • recurrent neural nets

    Fingerprint Dive into the research topics of 'Deep Temporal Models using Identity Skip-Connections for Speech Emotion Recognition'. Together they form a unique fingerprint.

  • Cite this

    Kim, J., Englebienne, G., Truong, K. P., & Evers, V. (2017). Deep Temporal Models using Identity Skip-Connections for Speech Emotion Recognition. In MM '17: Proceedings of the 2017 ACM on Multimedia Conference (pp. 1006-1013). Association for Computing Machinery (ACM). https://doi.org/10.1145/3123266.3123353