Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis

@article{Zhang2019LearningLR,
  title={Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis},
  author={Ya-Jie Zhang and Shifeng Pan and Lei He and Zhenhua Ling},
  journal={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2019},
  pages={6945-6949}
}
  • Ya-Jie Zhang, Shifeng Pan, +1 author Zhenhua Ling
  • Published 2019
  • Computer Science, Engineering
  • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
In this paper, we introduce the Variational Autoencoder (VAE) to an end-to-end speech synthesis model, to learn the latent representation of speaking styles in an unsupervised manner. [...] Key Method Style transfer can be achieved in this framework by first inferring style representation through the recognition network of VAE, then feeding it into TTS network to guide the style in synthesizing speech. To avoid Kullback-Leibler (KL) divergence collapse in training, several techniques are adopted. Finally, the…Expand
Learning Hierarchical Representations for Expressive Speaking Style in End-to-End Speech Synthesis
The Importance Weighted Autoencoder in End-to-End Speech Synthesis
Fine-grained style modelling and transfer in text-to-speech synthesis via content-style disentanglement
Cycle consistent network for end-to-end style transfer TTS training
Learning Robust Latent Representations for Controllable Speech Synthesis
Controlling Emotion Strength with Relative Attribute for End-to-End Speech Synthesis
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 23 REFERENCES
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder
Close to Human Quality TTS with Transformer
Generating Sentences from a Continuous Space
...
1
2
3
...