Waveform Generation for Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks

@article{Juvela2019WaveformGF,
  title={Waveform Generation for Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks},
  author={Lauri Juvela and Bajibabu Bollepalli and Junichi Yamagishi and Paavo Alku},
  journal={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2019},
  pages={6915-6919}
}
  • Lauri Juvela, Bajibabu Bollepalli, +1 author Paavo Alku
  • Published 2019
  • Computer Science, Engineering, Mathematics
  • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • The state-of-the-art in text-to-speech (TTS) synthesis has recently improved considerably due to novel neural waveform generation methods, such as WaveNet. [...] Key Result Listening test results show that while direct waveform generation with GAN is still far behind WaveNet, a GAN-based glottal excitation model can achieve quality and voice similarity on par with a WaveNet vocoder.Expand Abstract

    Figures and Topics from this paper.

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 42 REFERENCES

    Improved Training of Wasserstein GANs

    VIEW 6 EXCERPTS
    HIGHLY INFLUENTIAL

    Speaker-independent raw waveform model for glottal excitation

    VIEW 4 EXCERPTS

    Which Training Methods for GANs do actually Converge?

    VIEW 5 EXCERPTS
    HIGHLY INFLUENTIAL

    A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

    VIEW 1 EXCERPT

    A Fully Progressive Approach to Single-Image Super-Resolution

    VIEW 1 EXCERPT

    A Wavenet for Speech Denoising

    VIEW 1 EXCERPT