Speech-based recognition of self-reported and observed emotion in a dimensional space

  • 16 Citations

Abstract

The differences between self-reported and observed emotion have only marginally been investigated in the context of speech-based automatic emotion recognition. We address this issue by comparing self-reported emotion ratings to observed emotion ratings and look at how differences between these two types of ratings affect the development and performance of automatic emotion recognizers developed with these ratings. A dimensional approach to emotion modeling is adopted: the ratings are based on continuous arousal and valence scales. We describe the TNO-Gaming Corpus that contains spontaneous vocal and facial expressions elicited via a multiplayer videogame and that includes emotion annotations obtained via self-report and observation by outside observers. Comparisons show that there are discrepancies between self-reported and observed emotion ratings which are also reflected in the performance of the emotion recognizers developed. Using Support Vector Regression in combination with acoustic and textual features, recognizers of arousal and valence are developed that can predict points in a 2-dimensional arousal-valence space. The results of these recognizers show that the self-reported emotion is much harder to recognize than the observed emotion, and that averaging ratings from multiple observers improves performance.
Original languageUndefined
Pages (from-to)1049-1063
Number of pages15
JournalSpeech communication
Volume54
Issue number9
DOIs
StatePublished - Nov 2012

Fingerprint

emotion
Acoustics
Rating
performance
facial expression
acoustics
Valence
Arousal
speech
regression
observation
comparison
Observer

Keywords

  • EC Grant Agreement nr.: FP7/231287
  • EWI-22081
  • Emotion perception
  • Emotional speech
  • Emotion annotation
  • Emotion database
  • Automatic emotion recognition
  • Video games
  • Support Vector Regression
  • METIS-287943
  • Affective Computing
  • IR-80909
  • Audiovisual database
  • Emotion elicitation

Cite this

Truong, Khiet Phuong; van Leeuwen, David A.; de Jong, Franciska M.G. / Speech-based recognition of self-reported and observed emotion in a dimensional space.

In: Speech communication, Vol. 54, No. 9, 11.2012, p. 1049-1063.

Research output: Scientific - peer-reviewArticle

@article{e1400c9c3f65407c9435643e0f42a1f9,
title = "Speech-based recognition of self-reported and observed emotion in a dimensional space",
abstract = "The differences between self-reported and observed emotion have only marginally been investigated in the context of speech-based automatic emotion recognition. We address this issue by comparing self-reported emotion ratings to observed emotion ratings and look at how differences between these two types of ratings affect the development and performance of automatic emotion recognizers developed with these ratings. A dimensional approach to emotion modeling is adopted: the ratings are based on continuous arousal and valence scales. We describe the TNO-Gaming Corpus that contains spontaneous vocal and facial expressions elicited via a multiplayer videogame and that includes emotion annotations obtained via self-report and observation by outside observers. Comparisons show that there are discrepancies between self-reported and observed emotion ratings which are also reflected in the performance of the emotion recognizers developed. Using Support Vector Regression in combination with acoustic and textual features, recognizers of arousal and valence are developed that can predict points in a 2-dimensional arousal-valence space. The results of these recognizers show that the self-reported emotion is much harder to recognize than the observed emotion, and that averaging ratings from multiple observers improves performance.",
keywords = "EC Grant Agreement nr.: FP7/231287, EWI-22081, Emotion perception, Emotional speech, Emotion annotation, Emotion database, Automatic emotion recognition, Video games, Support Vector Regression, METIS-287943, Affective Computing, IR-80909, Audiovisual database, Emotion elicitation",
author = "Truong, {Khiet Phuong} and {van Leeuwen}, {David A.} and {de Jong}, {Franciska M.G.}",
note = "eemcs-eprint-22081",
year = "2012",
month = "11",
doi = "10.1016/j.specom.2012.04.006",
volume = "54",
pages = "1049--1063",
journal = "Speech communication",
issn = "0167-6393",
publisher = "Elsevier",
number = "9",

}

Speech-based recognition of self-reported and observed emotion in a dimensional space. / Truong, Khiet Phuong; van Leeuwen, David A.; de Jong, Franciska M.G.

In: Speech communication, Vol. 54, No. 9, 11.2012, p. 1049-1063.

Research output: Scientific - peer-reviewArticle

TY - JOUR

T1 - Speech-based recognition of self-reported and observed emotion in a dimensional space

AU - Truong,Khiet Phuong

AU - van Leeuwen,David A.

AU - de Jong,Franciska M.G.

N1 - eemcs-eprint-22081

PY - 2012/11

Y1 - 2012/11

N2 - The differences between self-reported and observed emotion have only marginally been investigated in the context of speech-based automatic emotion recognition. We address this issue by comparing self-reported emotion ratings to observed emotion ratings and look at how differences between these two types of ratings affect the development and performance of automatic emotion recognizers developed with these ratings. A dimensional approach to emotion modeling is adopted: the ratings are based on continuous arousal and valence scales. We describe the TNO-Gaming Corpus that contains spontaneous vocal and facial expressions elicited via a multiplayer videogame and that includes emotion annotations obtained via self-report and observation by outside observers. Comparisons show that there are discrepancies between self-reported and observed emotion ratings which are also reflected in the performance of the emotion recognizers developed. Using Support Vector Regression in combination with acoustic and textual features, recognizers of arousal and valence are developed that can predict points in a 2-dimensional arousal-valence space. The results of these recognizers show that the self-reported emotion is much harder to recognize than the observed emotion, and that averaging ratings from multiple observers improves performance.

AB - The differences between self-reported and observed emotion have only marginally been investigated in the context of speech-based automatic emotion recognition. We address this issue by comparing self-reported emotion ratings to observed emotion ratings and look at how differences between these two types of ratings affect the development and performance of automatic emotion recognizers developed with these ratings. A dimensional approach to emotion modeling is adopted: the ratings are based on continuous arousal and valence scales. We describe the TNO-Gaming Corpus that contains spontaneous vocal and facial expressions elicited via a multiplayer videogame and that includes emotion annotations obtained via self-report and observation by outside observers. Comparisons show that there are discrepancies between self-reported and observed emotion ratings which are also reflected in the performance of the emotion recognizers developed. Using Support Vector Regression in combination with acoustic and textual features, recognizers of arousal and valence are developed that can predict points in a 2-dimensional arousal-valence space. The results of these recognizers show that the self-reported emotion is much harder to recognize than the observed emotion, and that averaging ratings from multiple observers improves performance.

KW - EC Grant Agreement nr.: FP7/231287

KW - EWI-22081

KW - Emotion perception

KW - Emotional speech

KW - Emotion annotation

KW - Emotion database

KW - Automatic emotion recognition

KW - Video games

KW - Support Vector Regression

KW - METIS-287943

KW - Affective Computing

KW - IR-80909

KW - Audiovisual database

KW - Emotion elicitation

U2 - 10.1016/j.specom.2012.04.006

DO - 10.1016/j.specom.2012.04.006

M3 - Article

VL - 54

SP - 1049

EP - 1063

JO - Speech communication

T2 - Speech communication

JF - Speech communication

SN - 0167-6393

IS - 9

ER -