HMM-based visual speech synthesis using dynamic visemes


In this paper we incorporate dynamic visemes into hidden Markov model (HMM)-based visual speech synthesis. Dynamic visemes represent intuitive visual gestures identified automatically by clustering purely visual speech parameters. They have the advantage of spanning multiple phones and so they capture the effects of visual coarticulation explicitly within the unit. The previous application of dynamic visemes to synthesis used a sample-based approach, where cluster centroids were concatenated to form parameter trajectories corresponding to novel visual speech. In this paper we generalize the use of these units to create more flexible and dynamic animation using a HMM-based synthesis framework. We show using objective and subjective testing that a HMM synthesizer trained using dynamic visemes can generate better visual speech than HMM synthesizers trained using either phone or traditional viseme units.

7 Figures and Tables

Cite this paper

@inproceedings{Thangthai2015HMMbasedVS, title={HMM-based visual speech synthesis using dynamic visemes}, author={Ausdang Thangthai and Barry-John Theobald}, booktitle={AVSP}, year={2015} }