SGToolkit: An Interactive Gesture Authoring Toolkit for Embodied Conversational Agents

  title={SGToolkit: An Interactive Gesture Authoring Toolkit for Embodied Conversational Agents},
  author={Youngwoo Yoon and Keunwoo Park and Minsu Jang and Jaehong Kim and Geehyuk Lee},
  journal={The 34th Annual ACM Symposium on User Interface Software and Technology},
  • Youngwoo Yoon, Keunwoo Park, +2 authors Geehyuk Lee
  • Published 10 August 2021
  • Computer Science
  • The 34th Annual ACM Symposium on User Interface Software and Technology
Non-verbal behavior is essential for embodied agents like social robots, virtual avatars, and digital humans. Existing behavior authoring approaches including keyframe animation and motion capture are too expensive to use when there are numerous utterances requiring gestures. Automatic generation methods show promising results, but their output quality is not satisfactory yet, and it is hard to modify outputs as a gesture designer wants. We introduce a new gesture generation toolkit, named… 

Figures and Tables from this paper


Gesture generation by imitation: from human behavior to computer character animation
This dissertation shows how to generate conversational gestures for an animated embodied agent based on annotated text input using TV show recordings as empirical data and a software module was developed for each stage.
Speech gesture generation from the trimodal context of text, audio, and speaker identity
This paper presents an automatic gesture generation model that uses the multimodal context of speech text, audio, and speaker identity to reliably generate gestures that are human-like and that match with speech content and rhythm.
Style‐Controllable Speech‐Driven Gesture Synthesis Using Normalising Flows
This paper proposes a new generative model for generating state‐of‐the‐art realistic speech‐driven gesticulation, called MoGlow, and demonstrates the ability to exert directorial control over the output style, such as gesture level, speed, symmetry and spacial extent.
Gesture controllers
We introduce gesture controllers, a method for animating the body language of avatars engaged in live spoken conversation. A gesture controller is an optimal-policy controller that schedules gesture
Gesture controllers
The modularity of the proposed method allows customization of a character's gesture repertoire, animation of non-human characters, and the use of additional inputs such as speech recognition or direct user control.
A Large, Crowdsourced Evaluation of Gesture Generation Systems on Common Data: The GENEA Challenge 2020
The GENEA Challenge was launched, a gesture-generation challenge wherein participating teams built automatic gesture- generation systems on a common dataset, and the resulting systems were evaluated in parallel in a large, crowdsourced user study using the same motion-rendering pipeline.
Implementing Expressive Gesture Synthesis for Embodied Conversational Agents
This paper presents a computational model of gesture quality, characterize bodily expressivity with a small set of dimensions derived from a review of psychology literature, and describes the implementation of these dimensions in the animation system, including the gesture modeling language.
Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach
A new model is proposed, named Mix-StAGE, which trains a single model for multiple speakers while learning unique style embeddings for each speaker's gestures in an end-to-end manner and allows for style preservation when learning simultaneously from multiple speakers.
Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots
The proposed end-to-end neural network model consists of an encoder for speech text understanding and a decoder to generate a sequence of gestures that successfully produces various gestures including iconic, metaphoric, deictic, and beat gestures.
Analyzing Input and Output Representations for Speech-Driven Gesture Generation
A novel framework for automatic speech-driven gesture generation, applicable to human-agent interaction including both virtual agents and robots, using a denoising autoencoder neural network and a novel encoder network.