Photo-real talking head with deep bidirectional LSTM

@article{Fan2015PhotorealTH,
  title={Photo-real talking head with deep bidirectional LSTM},
  author={Bo Fan and Lijuan Wang and Frank K. Soong and Lei Xie},
  journal={2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2015},
  pages={4884-4888}
}
Long short-term memory (LSTM) is a specific recurrent neural network (RNN) architecture that is designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we propose to use deep bidirectional LSTM (BLSTM) for audio/visual modeling in our photo-real talking head system. An audio/visual database of a subject's talking is firstly recorded as our training data. The audio/visual stereo data are converted into two parallel temporal… CONTINUE READING
Highly Cited
This paper has 56 citations. REVIEW CITATIONS

Citations

Publications citing this paper.
Showing 1-10 of 32 extracted citations

A deep learning approach for generalized speech animation

ACM Trans. Graph. • 2017
View 10 Excerpts
Highly Influenced

A video prediction approach for animating single face image

Multimedia Tools and Applications • 2018
View 2 Excerpts
Method Support

Adaptation of an Expressive Single Speaker Deep Neural Network Speech Synthesis System

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2018
View 1 Excerpt

Expressive Speech-Driven Lip Movements with Multitask Learning

2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) • 2018
View 3 Excerpts

56 Citations

0102020152016201720182019
Citations per Year
Semantic Scholar estimates that this publication has 56 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-10 of 21 references

Statistical parametric speech synthesis using deep neural networks

2013 IEEE International Conference on Acoustics, Speech and Signal Processing • 2013
View 1 Excerpt

High quality lip-sync animation for 3D photo-realistic talking head

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2012
View 1 Excerpt

Synthesizing visual speech trajectory with minimum generation error

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2011
View 1 Excerpt

Supervised sequence labelling with recurrent neural networks

Studies in Computational Intelligence • 2008
View 1 Excerpt