Abstract
Virtual environments inhabited by virtual humans are now commonplace in
many applications, particularly in (serious) games.
These virtual humans interact with other (virtual) humans and their
surroundings. For such interactions, detailed \emph{control} over
their behavior is crucial. The control requirements for virtual humans range
from providing physical interaction with the environment to providing tight coordination with a
human interaction partner. Furthermore, the behavior of virtual humans should
\emph{look} realistic. Throughout this thesis the term \emph{naturalness}
is used for such perceived realism.
Many techniques achieve real-time animation. These
techniques differ in the trade-off they offer between the control that can be exerted over the
motion, the motion naturalness, and the required calculation time. Choosing the
right technique depends on the requirements of the application it is used in.
Motion (capture) editing techniques employ the detail of captured motion or the
talent of skilled animators, but they allow little deviation from the captured examples
and can lack physical realism. Procedural motion offers detailed and precise
control using a large number of parameters, but lacks naturalness. Physical
simulation provides integration with the physical environment and physical
realism. However, physical realism alone is not enough for naturalness and
physical simulation offers poor precision in both movement timing and limb placement.
Hybrid animation techniques combine and concatenate
motion generated by different animation paradigms to enhance both naturalness
and control.
This thesis contributes one such hybrid technique: mixed dynamics.
It combines the physical naturalness provided by physically
realistic animation with the control provided by procedural animation.
It builds on the notion that the requirements of physical integrity and tight
temporal synchronization are often of different importance for different body
parts. For example, for a gesturing virtual human, tight synchronization with
speech is primarily important for arm and head movement. At the same time, a physically
valid balancing motion of the whole body could be achieved by moving only the
lower body, where precise timing is less important.
Mixed dynamics allows one to mix procedural arm and head gestures with physical
simulation of the rest of the body. The forces generated by the gesturing body parts are
transferred to the physically simulated body parts, thus creating whole body
animation that appears to respect the laws of physics in a believable
manner and that is internally coherent (that is: the movement of the physically
steered body parts is affected by the movement of the procedurally steered
ones).
Traditionally, interaction with virtual humans was designed using
`transmitter\slash receiver' interaction paradigms, in which the user and the virtual
human take turns to transmit (encode) and receive (decode) messages carrying
meaning that travel across channels between them. Such an interaction model is insufficient to capture the richness of human-human interaction (including
conversation). Natural interaction requires a \emph{continuous} interaction
paradigm, where actors perceive acts and speech of others continuously, and
where actors can act continuously, simultaneously and therefore overlapping in time.
Such continuous interaction requires that the perception capabilities of the
virtual human are fast and provide incremental interpretation of another
agent's behavior. These interpretations are possibly extended
and revised over time. To be able to deal with such continuously updated
interpretations and rapid observations, the multimodal output generation modules
of the virtual humans should be capable of flexible production of behavior.
This includes adding or removing behavior elements at a late time,
coordinating behavior with predicted interlocutor events and adapting behavior
elements that have already been scheduled or are currently playing. This thesis
deals with the specification and execution of such flexible multimodal output.
The Behavior Markup Language (BML) has become the de facto standard for the
specification of the synchronized motor behavior (including speech and gesture)
of virtual humans. BML is interpreted by a \emph{BML Realizer}, that executes the specified behavior through the virtual
human it controls. Continuous interaction applications with virtual humans pose
several generic requirements on the specification of behavior execution, beyond
that of multimodal internal (that is, within the virtual human) synchronization and form
descriptions provided by BML.
Continuous interaction requires specification mechanisms
for the interruption of ongoing behavior, the change of the shape of ongoing
behavior (e.g. speak louder) and the synchronization of behavior with predicted
external time events (e.g. originating from the interlocutor). This thesis contributes BML Twente (BML\textsuperscript{T}),
a language that extends BML by providing the \emph{specification} of
the continuous interaction capabilities discussed above. It thus provides a generic interface to a Realizer through which continuous
interaction can be realized.
``Elckerlyc'' is designed as a BML Realizer for generating multimodal verbal and
nonverbal behavior for virtual humans. The main design characteristics
of Elckerlyc are that (1) it is designed specifically for \emph{continuous interaction} with tight coordination
between the behavior of a virtual human and that of its interaction partner; (2)
it provides an \emph{adjustable trade-off between the control and naturalness}
offered by different animation paradigms (e.g. procedural body
animation and physical body animation; \mbox{MPEG-4} facial animation and morph-based facial
animation), allowing the execution of the paradigms simultaneously; and (3) it is
designed to be highly \emph{modular and extensible} and allows adaptations and
extensions of the capabilities of the virtual human, without having to make
invasive modifications to Elckerlyc itself.
A BML Realizer is responsible for
executing the behaviors specified in the BML blocks sent to it, in such a way
that the time constraints specified in the BML blocks are satisfied. Realizer
implementations, including Elckerlyc, handle this by separating the BML scheduling process from the behavior execution process.
The scheduling process is responsible for creating a multimodal behavior plan that is in a
suitable form for execution.
In most BML Realizers the scheduling of BML results in a
\emph{rigid} multimodal realization plan in which the timing of all behaviors is fixed.
In Elckerlyc however, continuous interaction requirements dictate a
multimodal behavior plan that is modified continually at execution time.
Such modifications should not invalidate the time constraints between, for
example, speech and gesture that are specified in BML or result in
biologically infeasible behavior. Elckerlyc contributes a \emph{flexible} multimodal plan
representation that allows plan modification, while retaining timing and
naturalness constraints.
Elckerlyc is the first BML Realizer specifically designed for continuous
interaction. It contributes flexible formalisms for both the specification and
the modification of running behavior. It pioneers the use of physical simulation
and mixed dynamics in a real-time multimodal virtual human platform. This
provides physically coherent whole body involvement, a naturalness feature that
is lacking in virtual human platforms that solely use procedural animation.
Furthermore, Elckerlyc provides a more extensible and more thoroughly tested
architecture than existing BML Realizers. Other Realizers have implemented
alternative and more elaborate scheduling algorithms, or provide motor control
on modalities that are not present in Elckerlyc (e.g. blushing), or provide specialized
behavior elements (e.g. walking). Elckerlyc's extensibility allows one to easily
implement such specialized behaviors on existing modalities or new modalities
into Elckerlyc. Elckerlyc was also designed to allow the use of new scheduling algorithms;
the feasibility of this design feature is yet to be proven.
Elckerlyc is employed in several virtual human applications. Several of its
design features were motivated, fine-tuned and finally demonstrated by this `field'
experience of Elckerlyc.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Thesis sponsors | |
Award date | 9 Sept 2011 |
Place of Publication | Enschede, The Netherlands |
Publisher | |
Print ISBNs | 978-90-365-3233-4 |
DOIs | |
Publication status | Published - 9 Sept 2011 |
Keywords
- METIS-278839
- HMI-IA: Intelligent Agents
- EWI-20609
- IR-77917