Voice-driven animation

Abstract

We introduce a method for learning a mapping between signals, and use this to drive facial animation directly from vocal cues. Instead of depending on heuristic intermediate representations such as phonemes or visemes, the system learns its own representation, which includes dynamical and contextual information. In principle, this allows the system to make optimal use of context to handle ambiguity and relatively long-lasting facial co-articulation effects. The output is a series of facial control parameters, suitable for driving many different kinds of animation ranging from photo-realistic image warps to 3D cartoon characters. This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Information Technology Center America; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Information Technology Center America. All rights reserved. Copyright c Mitsubishi Electric Information Technology Center America, 1998 201 Broadway, Cambridge, Massachusetts 02139 Publication History:– 1. 6jul98 first circulated. 2. 26aug98 accepted to Workshop on Perceptual User Interfaces, November 1998, San Francisco 3. 21sep98 final version uploaded to workshop web site. Voice-driven animation Matthew Brand and Ken Shan MERL—a Mitsubishi Electric Research Lab 201 Broadway, Cambridge, MA, 02139 brand@merl.com

6 Figures and Tables

Cite this paper

@inproceedings{Brand1998VoicedrivenA, title={Voice-driven animation}, author={Matthew Brand and Ken Shan}, year={1998} }