Arousal and Valence Prediction in Spontaneous Emotional Speech: Felt versus Perceived Emotion

Khiet Phuong Truong, David A. van Leeuwen, Mark A. Neerincx, Franciska M.G. de Jong

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

15 Citations (Scopus)
139 Downloads (Pure)

Abstract

In this paper, we describe emotion recognition experiments carried out for spontaneous affective speech with the aim to compare the added value of annotation of felt emotion versus annotation of perceived emotion. Using speech material available in the TNO-GAMING corpus (a corpus containing audiovisual recordings of people playing videogames), speech-based affect recognizers were developed that can predict Arousal and Valence scalar values. Two types of recognizers were developed in parallel: one trained with felt emotion annotations (generated by the gamers themselves) and one trained with perceived/observed emotion annotations (generated by a group of observers). The experiments showed that, in speech, with the methods and features currently used, observed emotions are easier to predict than felt emotions. The results suggest that recognition performance strongly depends on how and by whom the emotion annotations are carried out.
Original languageEnglish
Title of host publicationProceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009)
PublisherInternational Speech Communication Association (ISCA)
Pages2027-2030
Number of pages4
Publication statusPublished - 2009
Event10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 - Brighton, United Kingdom
Duration: 6 Sep 200910 Sep 2009
Conference number: 10
http://www.interspeech2010.jpn.org/

Publication series

NamePublications Speech Processing Group, Brno University of Technology
PublisherInternational Speech Communication Association
ISSN (Print)1990-9772

Conference

Conference10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009
Abbreviated titleINTERSPEECH
CountryUnited Kingdom
CityBrighton
Period6/09/0910/09/09
Internet address

Fingerprint

Experiments

Keywords

  • IR-68948
  • EWI-17024
  • METIS-264250

Cite this

Truong, K. P., van Leeuwen, D. A., Neerincx, M. A., & de Jong, F. M. G. (2009). Arousal and Valence Prediction in Spontaneous Emotional Speech: Felt versus Perceived Emotion. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009) (pp. 2027-2030). (Publications Speech Processing Group, Brno University of Technology). International Speech Communication Association (ISCA).
Truong, Khiet Phuong ; van Leeuwen, David A. ; Neerincx, Mark A. ; de Jong, Franciska M.G. / Arousal and Valence Prediction in Spontaneous Emotional Speech : Felt versus Perceived Emotion. Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009). International Speech Communication Association (ISCA), 2009. pp. 2027-2030 (Publications Speech Processing Group, Brno University of Technology).
@inproceedings{40c17ef06e80419c9727933f7734e506,
title = "Arousal and Valence Prediction in Spontaneous Emotional Speech: Felt versus Perceived Emotion",
abstract = "In this paper, we describe emotion recognition experiments carried out for spontaneous affective speech with the aim to compare the added value of annotation of felt emotion versus annotation of perceived emotion. Using speech material available in the TNO-GAMING corpus (a corpus containing audiovisual recordings of people playing videogames), speech-based affect recognizers were developed that can predict Arousal and Valence scalar values. Two types of recognizers were developed in parallel: one trained with felt emotion annotations (generated by the gamers themselves) and one trained with perceived/observed emotion annotations (generated by a group of observers). The experiments showed that, in speech, with the methods and features currently used, observed emotions are easier to predict than felt emotions. The results suggest that recognition performance strongly depends on how and by whom the emotion annotations are carried out.",
keywords = "IR-68948, EWI-17024, METIS-264250",
author = "Truong, {Khiet Phuong} and {van Leeuwen}, {David A.} and Neerincx, {Mark A.} and {de Jong}, {Franciska M.G.}",
year = "2009",
language = "English",
series = "Publications Speech Processing Group, Brno University of Technology",
publisher = "International Speech Communication Association (ISCA)",
pages = "2027--2030",
booktitle = "Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009)",

}

Truong, KP, van Leeuwen, DA, Neerincx, MA & de Jong, FMG 2009, Arousal and Valence Prediction in Spontaneous Emotional Speech: Felt versus Perceived Emotion. in Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009). Publications Speech Processing Group, Brno University of Technology, International Speech Communication Association (ISCA), pp. 2027-2030, 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, Brighton, United Kingdom, 6/09/09.

Arousal and Valence Prediction in Spontaneous Emotional Speech : Felt versus Perceived Emotion. / Truong, Khiet Phuong; van Leeuwen, David A.; Neerincx, Mark A.; de Jong, Franciska M.G.

Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009). International Speech Communication Association (ISCA), 2009. p. 2027-2030 (Publications Speech Processing Group, Brno University of Technology).

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Arousal and Valence Prediction in Spontaneous Emotional Speech

T2 - Felt versus Perceived Emotion

AU - Truong, Khiet Phuong

AU - van Leeuwen, David A.

AU - Neerincx, Mark A.

AU - de Jong, Franciska M.G.

PY - 2009

Y1 - 2009

N2 - In this paper, we describe emotion recognition experiments carried out for spontaneous affective speech with the aim to compare the added value of annotation of felt emotion versus annotation of perceived emotion. Using speech material available in the TNO-GAMING corpus (a corpus containing audiovisual recordings of people playing videogames), speech-based affect recognizers were developed that can predict Arousal and Valence scalar values. Two types of recognizers were developed in parallel: one trained with felt emotion annotations (generated by the gamers themselves) and one trained with perceived/observed emotion annotations (generated by a group of observers). The experiments showed that, in speech, with the methods and features currently used, observed emotions are easier to predict than felt emotions. The results suggest that recognition performance strongly depends on how and by whom the emotion annotations are carried out.

AB - In this paper, we describe emotion recognition experiments carried out for spontaneous affective speech with the aim to compare the added value of annotation of felt emotion versus annotation of perceived emotion. Using speech material available in the TNO-GAMING corpus (a corpus containing audiovisual recordings of people playing videogames), speech-based affect recognizers were developed that can predict Arousal and Valence scalar values. Two types of recognizers were developed in parallel: one trained with felt emotion annotations (generated by the gamers themselves) and one trained with perceived/observed emotion annotations (generated by a group of observers). The experiments showed that, in speech, with the methods and features currently used, observed emotions are easier to predict than felt emotions. The results suggest that recognition performance strongly depends on how and by whom the emotion annotations are carried out.

KW - IR-68948

KW - EWI-17024

KW - METIS-264250

M3 - Conference contribution

T3 - Publications Speech Processing Group, Brno University of Technology

SP - 2027

EP - 2030

BT - Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009)

PB - International Speech Communication Association (ISCA)

ER -

Truong KP, van Leeuwen DA, Neerincx MA, de Jong FMG. Arousal and Valence Prediction in Spontaneous Emotional Speech: Felt versus Perceived Emotion. In Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009). International Speech Communication Association (ISCA). 2009. p. 2027-2030. (Publications Speech Processing Group, Brno University of Technology).