Listening Heads

I.A. de Kok

Research output: ThesisPhD Thesis - Research UT, graduation UT

Abstract

The thesis explores individual differences in listening behavior and how these differences can be used in the development and evaluation of listener response prediction models for embodied conversational agents. The thesis starts with introducing methods to collect multiple perspectives on listening behavior. The first method introduced is recording the listening behavior of multiple listeners in interaction with the same speaker. In the MultiLis corpus interactions between one speaker and three listeners are recorded. All four interlocutors believe they are engaged in a one-on-one interaction. The second method presented uses parasocial sampling to collected perspectives on listening behavior. Here participants watch videos of recorded speakers and annotate the places where they would give a listener response if they were the actual listener in the interaction. Following, the collected perspectives are combined into a consensus perspective. Doing this identifies response opportunities. Response opportunities are moments where at least one of the listeners has given a response. Through conversation analysis the characteristics of the response opportunities where most listeners responded and characteristics of response opportunities where only one listener responded are identified. The context of these response opportunities is analyzed on content of the speech, timing in relation to pauses, pitch and energy of the speech signal and gaze direction of the speaker. These features are used in the listener response prediction models that are developed in the next part of the thesis. These models are developed by learning the models on the MultiLis corpus. Different methods to use the perspectives are explored both in the development phase of the models and in the evaluation stage. The methods explored are 1) learning using a specific subset of the response opportunities based on number of listeners that responded as training or evaluation data, 2) active learning where the subjective ratings from observers of generated listeners based on an earlier model are used in the development of the subsequent model as negative training samples and 3) learning a model for each speaker independently and selecting the model based on the similarity of the new speaker and the speaker the model was trained on.
LanguageUndefined
Awarding Institution
  • University of Twente
Supervisors/Advisors
  • Heylen, Dirk K.J., Supervisor
  • Nijholt, Antinus , Supervisor
Award date12 Sep 2013
Place of PublicationEnschede
Publisher
Print ISBNs978-90-365-0648-9
DOIs
StatePublished - 12 Sep 2013

Keywords

  • EWI-23932
  • IR-87077
  • METIS-297392
  • HMI-IA: Intelligent Agents

Cite this

de Kok, I. A. (2013). Listening Heads Enschede: Universiteit Twente DOI: 10.3990/1.9789036506489
de Kok, I.A.. / Listening Heads. Enschede : Universiteit Twente, 2013. 146 p.
@phdthesis{0a799f33c352435599b428b544314a2b,
title = "Listening Heads",
abstract = "The thesis explores individual differences in listening behavior and how these differences can be used in the development and evaluation of listener response prediction models for embodied conversational agents. The thesis starts with introducing methods to collect multiple perspectives on listening behavior. The first method introduced is recording the listening behavior of multiple listeners in interaction with the same speaker. In the MultiLis corpus interactions between one speaker and three listeners are recorded. All four interlocutors believe they are engaged in a one-on-one interaction. The second method presented uses parasocial sampling to collected perspectives on listening behavior. Here participants watch videos of recorded speakers and annotate the places where they would give a listener response if they were the actual listener in the interaction. Following, the collected perspectives are combined into a consensus perspective. Doing this identifies response opportunities. Response opportunities are moments where at least one of the listeners has given a response. Through conversation analysis the characteristics of the response opportunities where most listeners responded and characteristics of response opportunities where only one listener responded are identified. The context of these response opportunities is analyzed on content of the speech, timing in relation to pauses, pitch and energy of the speech signal and gaze direction of the speaker. These features are used in the listener response prediction models that are developed in the next part of the thesis. These models are developed by learning the models on the MultiLis corpus. Different methods to use the perspectives are explored both in the development phase of the models and in the evaluation stage. The methods explored are 1) learning using a specific subset of the response opportunities based on number of listeners that responded as training or evaluation data, 2) active learning where the subjective ratings from observers of generated listeners based on an earlier model are used in the development of the subsequent model as negative training samples and 3) learning a model for each speaker independently and selecting the model based on the similarity of the new speaker and the speaker the model was trained on.",
keywords = "EWI-23932, IR-87077, METIS-297392, HMI-IA: Intelligent Agents",
author = "{de Kok}, I.A.",
note = "SIKS Dissertation Series No. 2013-29",
year = "2013",
month = "9",
day = "12",
doi = "10.3990/1.9789036506489",
language = "Undefined",
isbn = "978-90-365-0648-9",
publisher = "Universiteit Twente",
school = "University of Twente",

}

de Kok, IA 2013, 'Listening Heads', University of Twente, Enschede. DOI: 10.3990/1.9789036506489

Listening Heads. / de Kok, I.A.

Enschede : Universiteit Twente, 2013. 146 p.

Research output: ThesisPhD Thesis - Research UT, graduation UT

TY - THES

T1 - Listening Heads

AU - de Kok,I.A.

N1 - SIKS Dissertation Series No. 2013-29

PY - 2013/9/12

Y1 - 2013/9/12

N2 - The thesis explores individual differences in listening behavior and how these differences can be used in the development and evaluation of listener response prediction models for embodied conversational agents. The thesis starts with introducing methods to collect multiple perspectives on listening behavior. The first method introduced is recording the listening behavior of multiple listeners in interaction with the same speaker. In the MultiLis corpus interactions between one speaker and three listeners are recorded. All four interlocutors believe they are engaged in a one-on-one interaction. The second method presented uses parasocial sampling to collected perspectives on listening behavior. Here participants watch videos of recorded speakers and annotate the places where they would give a listener response if they were the actual listener in the interaction. Following, the collected perspectives are combined into a consensus perspective. Doing this identifies response opportunities. Response opportunities are moments where at least one of the listeners has given a response. Through conversation analysis the characteristics of the response opportunities where most listeners responded and characteristics of response opportunities where only one listener responded are identified. The context of these response opportunities is analyzed on content of the speech, timing in relation to pauses, pitch and energy of the speech signal and gaze direction of the speaker. These features are used in the listener response prediction models that are developed in the next part of the thesis. These models are developed by learning the models on the MultiLis corpus. Different methods to use the perspectives are explored both in the development phase of the models and in the evaluation stage. The methods explored are 1) learning using a specific subset of the response opportunities based on number of listeners that responded as training or evaluation data, 2) active learning where the subjective ratings from observers of generated listeners based on an earlier model are used in the development of the subsequent model as negative training samples and 3) learning a model for each speaker independently and selecting the model based on the similarity of the new speaker and the speaker the model was trained on.

AB - The thesis explores individual differences in listening behavior and how these differences can be used in the development and evaluation of listener response prediction models for embodied conversational agents. The thesis starts with introducing methods to collect multiple perspectives on listening behavior. The first method introduced is recording the listening behavior of multiple listeners in interaction with the same speaker. In the MultiLis corpus interactions between one speaker and three listeners are recorded. All four interlocutors believe they are engaged in a one-on-one interaction. The second method presented uses parasocial sampling to collected perspectives on listening behavior. Here participants watch videos of recorded speakers and annotate the places where they would give a listener response if they were the actual listener in the interaction. Following, the collected perspectives are combined into a consensus perspective. Doing this identifies response opportunities. Response opportunities are moments where at least one of the listeners has given a response. Through conversation analysis the characteristics of the response opportunities where most listeners responded and characteristics of response opportunities where only one listener responded are identified. The context of these response opportunities is analyzed on content of the speech, timing in relation to pauses, pitch and energy of the speech signal and gaze direction of the speaker. These features are used in the listener response prediction models that are developed in the next part of the thesis. These models are developed by learning the models on the MultiLis corpus. Different methods to use the perspectives are explored both in the development phase of the models and in the evaluation stage. The methods explored are 1) learning using a specific subset of the response opportunities based on number of listeners that responded as training or evaluation data, 2) active learning where the subjective ratings from observers of generated listeners based on an earlier model are used in the development of the subsequent model as negative training samples and 3) learning a model for each speaker independently and selecting the model based on the similarity of the new speaker and the speaker the model was trained on.

KW - EWI-23932

KW - IR-87077

KW - METIS-297392

KW - HMI-IA: Intelligent Agents

U2 - 10.3990/1.9789036506489

DO - 10.3990/1.9789036506489

M3 - PhD Thesis - Research UT, graduation UT

SN - 978-90-365-0648-9

PB - Universiteit Twente

CY - Enschede

ER -

de Kok IA. Listening Heads. Enschede: Universiteit Twente, 2013. 146 p. Available from, DOI: 10.3990/1.9789036506489