Corpus ID: 182952550

Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis

@article{Battenberg2019EffectiveUO,
  title={Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis},
  author={Eric Battenberg and Soroosh Mariooryad and Daisy Stanton and R. Skerry-Ryan and M. Shannon and D. Kao and Tom Bagby},
  journal={ArXiv},
  year={2019},
  volume={abs/1906.03402}
}
  • Eric Battenberg, Soroosh Mariooryad, +4 authors Tom Bagby
  • Published 2019
  • Computer Science, Engineering
  • ArXiv
  • Recent work has explored sequence-to-sequence latent variable models for expressive speech synthesis (supporting control and transfer of prosody and style), but has not presented a coherent framework for understanding the trade-offs between the competing methods. In this paper, we propose embedding capacity (the amount of information the embedding contains about the data) as a unified method of analyzing the behavior of latent variable models of speech, comparing existing heuristic (non… CONTINUE READING
    15 Citations
    Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior
    • Guangzhi Sun, Y. Zhang, +5 authors Yonghui Wu
    • Computer Science, Engineering
    • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2020
    • 14
    • PDF
    Semi-Supervised Generative Modeling for Controllable Speech Synthesis
    • 10
    • PDF
    Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding
    • 1
    • PDF
    Semi-Supervised Learning Based on Hierarchical Generative Models for End-to-End Speech Synthesis
    Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis
    • 15
    • PDF
    Controlling Emotion Strength with Relative Attribute for End-to-End Speech Synthesis
    • 2
    Location-Relative Attention Mechanisms for Robust Long-Form Speech Synthesis
    • 30
    • PDF

    References

    SHOWING 1-10 OF 30 REFERENCES
    Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
    • 186
    • PDF
    Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis
    • 30
    • PDF
    Hierarchical Generative Modeling for Controllable Speech Synthesis
    • 86
    • PDF
    Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis
    • 33
    • PDF
    Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis
    • Younggun Lee, Taesu Kim
    • Computer Science, Engineering
    • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2019
    • 34
    • PDF
    Tacotron: Towards End-to-End Speech Synthesis
    • 563
    • PDF