# Behavior Generation for Interpersonal Coordination with Virtual Humans: on Specifying, Scheduling and Realizing Multimodal Virtual Human Behavior

H. van Welbergen

Research output: ThesisPhD Thesis - Research UT, graduation UT

## Abstract

Virtual environments inhabited by virtual humans are now commonplace in many applications, particularly in (serious) games. These virtual humans interact with other (virtual) humans and their surroundings. For such interactions, detailed \emph{control} over their behavior is crucial. The control requirements for virtual humans range from providing physical interaction with the environment to providing tight coordination with a human interaction partner. Furthermore, the behavior of virtual humans should \emph{look} realistic. Throughout this thesis the term \emph{naturalness} is used for such perceived realism. Many techniques achieve real-time animation. These techniques differ in the trade-off they offer between the control that can be exerted over the motion, the motion naturalness, and the required calculation time. Choosing the right technique depends on the requirements of the application it is used in. Motion (capture) editing techniques employ the detail of captured motion or the talent of skilled animators, but they allow little deviation from the captured examples and can lack physical realism. Procedural motion offers detailed and precise control using a large number of parameters, but lacks naturalness. Physical simulation provides integration with the physical environment and physical realism. However, physical realism alone is not enough for naturalness and physical simulation offers poor precision in both movement timing and limb placement. Hybrid animation techniques combine and concatenate motion generated by different animation paradigms to enhance both naturalness and control. This thesis contributes one such hybrid technique: mixed dynamics. It combines the physical naturalness provided by physically realistic animation with the control provided by procedural animation. It builds on the notion that the requirements of physical integrity and tight temporal synchronization are often of different importance for different body parts. For example, for a gesturing virtual human, tight synchronization with speech is primarily important for arm and head movement. At the same time, a physically valid balancing motion of the whole body could be achieved by moving only the lower body, where precise timing is less important. Mixed dynamics allows one to mix procedural arm and head gestures with physical simulation of the rest of the body. The forces generated by the gesturing body parts are transferred to the physically simulated body parts, thus creating whole body animation that appears to respect the laws of physics in a believable manner and that is internally coherent (that is: the movement of the physically steered body parts is affected by the movement of the procedurally steered ones). Traditionally, interaction with virtual humans was designed using transmitter\slash receiver' interaction paradigms, in which the user and the virtual human take turns to transmit (encode) and receive (decode) messages carrying meaning that travel across channels between them. Such an interaction model is insufficient to capture the richness of human-human interaction (including conversation). Natural interaction requires a \emph{continuous} interaction paradigm, where actors perceive acts and speech of others continuously, and where actors can act continuously, simultaneously and therefore overlapping in time. Such continuous interaction requires that the perception capabilities of the virtual human are fast and provide incremental interpretation of another agent's behavior. These interpretations are possibly extended and revised over time. To be able to deal with such continuously updated interpretations and rapid observations, the multimodal output generation modules of the virtual humans should be capable of flexible production of behavior. This includes adding or removing behavior elements at a late time, coordinating behavior with predicted interlocutor events and adapting behavior elements that have already been scheduled or are currently playing. This thesis deals with the specification and execution of such flexible multimodal output. The Behavior Markup Language (BML) has become the de facto standard for the specification of the synchronized motor behavior (including speech and gesture) of virtual humans. BML is interpreted by a \emph{BML Realizer}, that executes the specified behavior through the virtual human it controls. Continuous interaction applications with virtual humans pose several generic requirements on the specification of behavior execution, beyond that of multimodal internal (that is, within the virtual human) synchronization and form descriptions provided by BML. Continuous interaction requires specification mechanisms for the interruption of ongoing behavior, the change of the shape of ongoing behavior (e.g. speak louder) and the synchronization of behavior with predicted external time events (e.g. originating from the interlocutor). This thesis contributes BML Twente (BML\textsuperscript{T}), a language that extends BML by providing the \emph{specification} of the continuous interaction capabilities discussed above. It thus provides a generic interface to a Realizer through which continuous interaction can be realized. Elckerlyc'' is designed as a BML Realizer for generating multimodal verbal and nonverbal behavior for virtual humans. The main design characteristics of Elckerlyc are that (1) it is designed specifically for \emph{continuous interaction} with tight coordination between the behavior of a virtual human and that of its interaction partner; (2) it provides an \emph{adjustable trade-off between the control and naturalness} offered by different animation paradigms (e.g. procedural body animation and physical body animation; \mbox{MPEG-4} facial animation and morph-based facial animation), allowing the execution of the paradigms simultaneously; and (3) it is designed to be highly \emph{modular and extensible} and allows adaptations and extensions of the capabilities of the virtual human, without having to make invasive modifications to Elckerlyc itself. A BML Realizer is responsible for executing the behaviors specified in the BML blocks sent to it, in such a way that the time constraints specified in the BML blocks are satisfied. Realizer implementations, including Elckerlyc, handle this by separating the BML scheduling process from the behavior execution process. The scheduling process is responsible for creating a multimodal behavior plan that is in a suitable form for execution. In most BML Realizers the scheduling of BML results in a \emph{rigid} multimodal realization plan in which the timing of all behaviors is fixed. In Elckerlyc however, continuous interaction requirements dictate a multimodal behavior plan that is modified continually at execution time. Such modifications should not invalidate the time constraints between, for example, speech and gesture that are specified in BML or result in biologically infeasible behavior. Elckerlyc contributes a \emph{flexible} multimodal plan representation that allows plan modification, while retaining timing and naturalness constraints. Elckerlyc is the first BML Realizer specifically designed for continuous interaction. It contributes flexible formalisms for both the specification and the modification of running behavior. It pioneers the use of physical simulation and mixed dynamics in a real-time multimodal virtual human platform. This provides physically coherent whole body involvement, a naturalness feature that is lacking in virtual human platforms that solely use procedural animation. Furthermore, Elckerlyc provides a more extensible and more thoroughly tested architecture than existing BML Realizers. Other Realizers have implemented alternative and more elaborate scheduling algorithms, or provide motor control on modalities that are not present in Elckerlyc (e.g. blushing), or provide specialized behavior elements (e.g. walking). Elckerlyc's extensibility allows one to easily implement such specialized behaviors on existing modalities or new modalities into Elckerlyc. Elckerlyc was also designed to allow the use of new scheduling algorithms; the feasibility of this design feature is yet to be proven. Elckerlyc is employed in several virtual human applications. Several of its design features were motivated, fine-tuned and finally demonstrated by this field' experience of Elckerlyc.
Original language Undefined University of Twente Nijholt, Antinus, SupervisorRuttkay, Z.M., AdvisorReidsma, Dennis, Advisor GATE 9 Sep 2011 Enschede, The Netherlands University of Twente 978-90-365-3233-4 https://doi.org/10.3990/1.9789036532334 Published - 9 Sep 2011

## Keywords

• METIS-278839
• HMI-IA: Intelligent Agents
• EWI-20609
• IR-77917