Abstract
During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model (HMM) or Conditional Random Fields (CRF)) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods), our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.
Original language | Undefined |
---|---|
Title of host publication | Proceedings of the Eight International Conference on Intelligent Virtual Agents 2008 |
Editors | Helmut Prendinger, James Lester, Mitsuru Ishizuka |
Place of Publication | Berlin |
Publisher | Springer |
Pages | 176-190 |
Number of pages | 15 |
ISBN (Print) | 978-3-540-85482-1 |
DOIs | |
Publication status | Published - 2008 |
Event | 8th International Conference on Intelligent Virtual Agents, IVA 2008 - Tokyo, Japan Duration: 1 Sep 2008 → 3 Sep 2008 Conference number: 8 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer Verlag |
Volume | 5208/2008 |
Conference
Conference | 8th International Conference on Intelligent Virtual Agents, IVA 2008 |
---|---|
Abbreviated title | IVA |
Country | Japan |
City | Tokyo |
Period | 1/09/08 → 3/09/08 |
Keywords
- METIS-264251
- IR-68949
- HMI-HF: Human Factors
- HMI-IA: Intelligent Agents
- EWI-17025