Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss

@article{Chen2019HierarchicalCT,
  title={Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss},
  author={Lele Chen and Ross K. Maddox and Zhiyao Duan and Chenliang Xu},
  journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019},
  pages={7824-7833}
}
We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions. [...] Key Method Compared to a direct audio-to-image approach, our cascade approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content. We, humans, are sensitive to temporal discontinuities and subtle artifacts in video.Expand
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose
Talking Face Generation via Learning Semantic and Temporal Synchronous Landmarks
Audio-driven Talking Face Video Generation with Natural Head Pose
Talking Face Generation with Expression-Tailored Generative Adversarial Network
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
Facial expression GAN for voice-driven face generation
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 41 REFERENCES
Talking Face Generation by Conditional Recurrent Adversarial Network
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
Lip Movements Generation at a Glance
Audio-driven facial animation by joint end-to-end learning of pose and emotion
VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track
Generating Talking Face Landmarks from Speech
GANimation: Anatomically-aware Facial Animation from a Single Image
Deep Cross-Modal Audio-Visual Generation
Photo-real talking head with deep bidirectional LSTM
Video Generation From Text
...
1
2
3
4
5
...