A Multimodal Analysis of Vocal and Visual Backchannels in Spontaneous Dialogs

Khiet Phuong Truong, Ronald Walter Poppe, I.A. de Kok, Dirk K.J. Heylen

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

13 Citations (Scopus)
41 Downloads (Pure)

Abstract

Backchannels (BCs) are short vocal and visual listener responses that signal attention, interest, and understanding to the speaker. Previous studies have investigated BC prediction in telephone-style dialogs from prosodic cues. In contrast, we consider spontaneous face-to-face dialogs. The additional visual modality allows speaker and listener to monitor each other's attention continuously, and we hypothesize that this affects the BC-inviting cues. In this study, we investigate how gaze, in addition to prosody, can cue BCs. Moreover, we focus on the type of BC performed, with the aim to find out whether vocal and visual BCs are invited by similar cues. In contrast to telephone-style dialogs, we do not find rising/falling pitch to be a BC-inviting cue. However, in a face-to-face setting, gaze appears to cue BCs. In addition, we find that mutual gaze occurs significantly more often during visual BCs. Moreover, vocal BCs are more likely to be timed during pauses in the speaker's speech.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2011
Place of PublicationFrance
PublisherInternational Speech Communication Association (ISCA)
Pages2973-2976
Number of pages4
Publication statusPublished - Aug 2011
Event12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011 - Florence, Italy
Duration: 28 Aug 201131 Aug 2011
Conference number: 12

Publication series

Name
PublisherInternational Speech Communication Association
ISSN (Print)1990-9772

Conference

Conference12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011
Abbreviated titleINTERSPEECH
CountryItaly
CityFlorence
Period28/08/1131/08/11

Fingerprint

Telephone

Keywords

  • METIS-279669
  • IR-78349
  • EWI-20721
  • EC Grant Agreement nr.: FP7/231287

Cite this

Truong, K. P., Poppe, R. W., de Kok, I. A., & Heylen, D. K. J. (2011). A Multimodal Analysis of Vocal and Visual Backchannels in Spontaneous Dialogs. In Proceedings of Interspeech 2011 (pp. 2973-2976). France: International Speech Communication Association (ISCA).
Truong, Khiet Phuong ; Poppe, Ronald Walter ; de Kok, I.A. ; Heylen, Dirk K.J. / A Multimodal Analysis of Vocal and Visual Backchannels in Spontaneous Dialogs. Proceedings of Interspeech 2011. France : International Speech Communication Association (ISCA), 2011. pp. 2973-2976
@inproceedings{134571f498344afa85df3d9e996b9210,
title = "A Multimodal Analysis of Vocal and Visual Backchannels in Spontaneous Dialogs",
abstract = "Backchannels (BCs) are short vocal and visual listener responses that signal attention, interest, and understanding to the speaker. Previous studies have investigated BC prediction in telephone-style dialogs from prosodic cues. In contrast, we consider spontaneous face-to-face dialogs. The additional visual modality allows speaker and listener to monitor each other's attention continuously, and we hypothesize that this affects the BC-inviting cues. In this study, we investigate how gaze, in addition to prosody, can cue BCs. Moreover, we focus on the type of BC performed, with the aim to find out whether vocal and visual BCs are invited by similar cues. In contrast to telephone-style dialogs, we do not find rising/falling pitch to be a BC-inviting cue. However, in a face-to-face setting, gaze appears to cue BCs. In addition, we find that mutual gaze occurs significantly more often during visual BCs. Moreover, vocal BCs are more likely to be timed during pauses in the speaker's speech.",
keywords = "METIS-279669, IR-78349, EWI-20721, EC Grant Agreement nr.: FP7/231287",
author = "Truong, {Khiet Phuong} and Poppe, {Ronald Walter} and {de Kok}, I.A. and Heylen, {Dirk K.J.}",
note = "eemcs-eprint-20721",
year = "2011",
month = "8",
language = "English",
publisher = "International Speech Communication Association (ISCA)",
pages = "2973--2976",
booktitle = "Proceedings of Interspeech 2011",

}

Truong, KP, Poppe, RW, de Kok, IA & Heylen, DKJ 2011, A Multimodal Analysis of Vocal and Visual Backchannels in Spontaneous Dialogs. in Proceedings of Interspeech 2011. International Speech Communication Association (ISCA), France, pp. 2973-2976, 12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011, Florence, Italy, 28/08/11.

A Multimodal Analysis of Vocal and Visual Backchannels in Spontaneous Dialogs. / Truong, Khiet Phuong; Poppe, Ronald Walter; de Kok, I.A.; Heylen, Dirk K.J.

Proceedings of Interspeech 2011. France : International Speech Communication Association (ISCA), 2011. p. 2973-2976.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - A Multimodal Analysis of Vocal and Visual Backchannels in Spontaneous Dialogs

AU - Truong, Khiet Phuong

AU - Poppe, Ronald Walter

AU - de Kok, I.A.

AU - Heylen, Dirk K.J.

N1 - eemcs-eprint-20721

PY - 2011/8

Y1 - 2011/8

N2 - Backchannels (BCs) are short vocal and visual listener responses that signal attention, interest, and understanding to the speaker. Previous studies have investigated BC prediction in telephone-style dialogs from prosodic cues. In contrast, we consider spontaneous face-to-face dialogs. The additional visual modality allows speaker and listener to monitor each other's attention continuously, and we hypothesize that this affects the BC-inviting cues. In this study, we investigate how gaze, in addition to prosody, can cue BCs. Moreover, we focus on the type of BC performed, with the aim to find out whether vocal and visual BCs are invited by similar cues. In contrast to telephone-style dialogs, we do not find rising/falling pitch to be a BC-inviting cue. However, in a face-to-face setting, gaze appears to cue BCs. In addition, we find that mutual gaze occurs significantly more often during visual BCs. Moreover, vocal BCs are more likely to be timed during pauses in the speaker's speech.

AB - Backchannels (BCs) are short vocal and visual listener responses that signal attention, interest, and understanding to the speaker. Previous studies have investigated BC prediction in telephone-style dialogs from prosodic cues. In contrast, we consider spontaneous face-to-face dialogs. The additional visual modality allows speaker and listener to monitor each other's attention continuously, and we hypothesize that this affects the BC-inviting cues. In this study, we investigate how gaze, in addition to prosody, can cue BCs. Moreover, we focus on the type of BC performed, with the aim to find out whether vocal and visual BCs are invited by similar cues. In contrast to telephone-style dialogs, we do not find rising/falling pitch to be a BC-inviting cue. However, in a face-to-face setting, gaze appears to cue BCs. In addition, we find that mutual gaze occurs significantly more often during visual BCs. Moreover, vocal BCs are more likely to be timed during pauses in the speaker's speech.

KW - METIS-279669

KW - IR-78349

KW - EWI-20721

KW - EC Grant Agreement nr.: FP7/231287

M3 - Conference contribution

SP - 2973

EP - 2976

BT - Proceedings of Interspeech 2011

PB - International Speech Communication Association (ISCA)

CY - France

ER -

Truong KP, Poppe RW, de Kok IA, Heylen DKJ. A Multimodal Analysis of Vocal and Visual Backchannels in Spontaneous Dialogs. In Proceedings of Interspeech 2011. France: International Speech Communication Association (ISCA). 2011. p. 2973-2976