The thesis explores individual differences in listening behavior and how these differences can be used in the development and evaluation of listener response prediction models for embodied conversational agents. The thesis starts with introducing methods to collect multiple perspectives on listening behavior. The first method introduced is recording the listening behavior of multiple listeners in interaction with the same speaker. In the MultiLis corpus interactions between one speaker and three listeners are recorded. All four interlocutors believe they are engaged in a one-on-one interaction. The second method presented uses parasocial sampling to collected perspectives on listening behavior. Here participants watch videos of recorded speakers and annotate the places where they would give a listener response if they were the actual listener in the interaction. Following, the collected perspectives are combined into a consensus perspective. Doing this identifies response opportunities. Response opportunities are moments where at least one of the listeners has given a response. Through conversation analysis the characteristics of the response opportunities where most listeners responded and characteristics of response opportunities where only one listener responded are identified. The context of these response opportunities is analyzed on content of the speech, timing in relation to pauses, pitch and energy of the speech signal and gaze direction of the speaker. These features are used in the listener response prediction models that are developed in the next part of the thesis. These models are developed by learning the models on the MultiLis corpus. Different methods to use the perspectives are explored both in the development phase of the models and in the evaluation stage. The methods explored are 1) learning using a specific subset of the response opportunities based on number of listeners that responded as training or evaluation data, 2) active learning where the subjective ratings from observers of generated listeners based on an earlier model are used in the development of the subsequent model as negative training samples and 3) learning a model for each speaker independently and selecting the model based on the similarity of the new speaker and the speaker the model was trained on.
|Award date||12 Sep 2013|
|Place of Publication||Enschede|
|Publication status||Published - 12 Sep 2013|
- HMI-IA: Intelligent Agents