Passing a Non-verbal Turing Test: Evaluating Gesture Animations Generated from Speech

  title={ Passing a Non-verbal Turing Test: Evaluating Gesture Animations Generated from Speech},
  author={Manuel Rebol and Christian G{\"u}tl and Krzysztof Pietroszek},
  journal={2021 IEEE Virtual Reality and 3D User Interfaces (VR)},
In real life, people communicate using both speech and non-verbal signals such as gestures, face expression or body pose. Non-verbal signals impact the meaning of the spoken utterance in an abundance of ways. An absence of non-verbal signals impoverishes the process of communication. Yet, when users are represented as avatars, it is difficult to translate non-verbal signals along with the speech into the virtual world without specialized motion-capture hardware. In this paper, we propose a… 

Figures and Tables from this paper

A Comprehensive Review of Data-Driven Co-Speech Gesture Generation

Key research challenges in gesture generation are identified, including data availability and quality; producing human-like motion; grounding the gesture in the co-occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications.

Real-time Gesture Animation Generation from Speech for Virtual Human Interaction

This work proposes a real-time system for synthesizing gestures directly from speech based on Generative Adversarial Neural Networks to model the speech-gesture relationship and achieves a delay below three seconds between the time of audio input and gesture animation.

ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech

We present ZeroEGGS, a neural network framework for speech-driven gesture generation with zero-shot style control by example. This means style can be controlled via only a short example motion clip,

Exemplar-based Stylized Gesture Generation from Speech: An Entry to the GENEA Challenge 2022

The model is a neural network that generates gesture animation from an input audio file that is embedded in a latent space using a variational framework, addressing the stochastic nature of gesture motion.

Mixed Reality Communication System for Procedural Tasks

The system allows a remote expert to spatially guide a local operator using a real-time volumetric representation of the patient using voice, virtual hand metaphor, and annotations performed in situ.

Mixed Reality Communication for Medical Procedures: Teaching the Placement of a Central Venous Catheter

The results indicate that the mixed reality real-time communication system enhances and offers new possibilities for visual communication compared to video teleconference-based training and to improve remote emergency assistance.



Predicting Co-verbal Gestures: A Deep and Temporal Modeling Approach

A gestural sign scheme to facilitate supervised learning and the DCNF model, a model to jointly learn deep neural networks and second order linear chain temporal contingency are presented, which shows significant improvement over previous work on gesture prediction.

Real-time prosody-driven synthesis of body language

This work presents a method for automatically synthesizing body language animations directly from the participants' speech signals, without the need for additional input, suitable for animating characters from live human speech.

Multi-objective adversarial gesture generation

This work explores the use of a generative adversarial training paradigm to map speech to 3D gesture motion in a series of smaller sub-problems, including plausible gesture dynamics, realistic joint configurations, and diverse and smooth motion.

Creating a Gesture-Speech Dataset for Speech-Based Automatic Gesture Generation

For the categories of the recorded gestures, metaphoric gestures appeared the most, 68.41% of all gestures, followed by 23.73% beat gestures, 4.76% iconic gestures, and 3.11% deictic gestures.

Virtual character performance from speech

A method for generating a 3D virtual character performance from the audio signal by inferring the acoustic and semantic properties of the utterance by utilizing semantics in addition to prosody to generate virtual character performances that are more appropriate than methods that use only prosody.

Style‐Controllable Speech‐Driven Gesture Synthesis Using Normalising Flows

This paper proposes a new generative model for generating state‐of‐the‐art realistic speech‐driven gesticulation, called MoGlow, and demonstrates the ability to exert directorial control over the output style, such as gesture level, speed, symmetry and spacial extent.

Gesture generation by imitation: from human behavior to computer character animation

This dissertation shows how to generate conversational gestures for an animated embodied agent based on annotated text input using TV show recordings as empirical data and a software module was developed for each stage.

Investigating the use of recurrent motion modelling for speech gesture generation

This work explores the use of transfer learning using previous motion modelling research to improve learning outcomes for gesture generation from speech, using a recurrent network with an encoder-decoder structure that takes in prosodic speech features and generates a short sequence of gesture motion.

Synthesizing multimodal utterances for conversational agents

An incremental production model is presented that combines the synthesis of synchronized gestural, verbal, and facial behaviors with mechanisms for linking them in fluent utterances with natural co‐articulation and transition effects.