Deep Voice 2: Multi-Speaker Neural Text-to-Speech

  title={Deep Voice 2: Multi-Speaker Neural Text-to-Speech},
  author={Sercan {\"O}mer Arik and Gregory Frederick Diamos and Andrew Gibiansky and John Miller and Kainan Peng and Wei Ping and Jonathan Raiman and Yanqi Zhou},
We introduce a technique for augmenting neural text-to-speech (TTS) with lowdimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-ofthe-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality… CONTINUE READING
Highly Influential
This paper has highly influenced 14 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 110 citations. REVIEW CITATIONS
Related Discussions
This paper has been referenced on Twitter 33 times. VIEW TWEETS


Publications citing this paper.
Showing 1-10 of 54 extracted citations

Emphatic Speech Synthesis and Control Based on Characteristic Transferring in End-to-End Speech Synthesis

2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia) • 2018
View 4 Excerpts
Highly Influenced

End-to-End Neural Speech Synthesis

View 7 Excerpts
Highly Influenced


View 7 Excerpts
Highly Influenced

ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech

View 4 Excerpts
Method Support
Highly Influenced

Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2018
View 2 Excerpts
Highly Influenced

110 Citations

Citations per Year
Semantic Scholar estimates that this publication has 110 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 24 references

Tacotron: Towards End-to-End Speech Synthesis

View 8 Excerpts
Highly Influenced