Controllable neural text-to-speech synthesis using intuitive prosodic features
@inproceedings{Raitio2020ControllableNT, title={Controllable neural text-to-speech synthesis using intuitive prosodic features}, author={T. Raitio and Ramya Rasipuram and Dan Castellani}, booktitle={INTERSPEECH}, year={2020} }
Modern neural text-to-speech (TTS) synthesis can generate speech that is indistinguishable from natural speech. However, the prosody of generated utterances often represents the average prosodic style of the database instead of having wide prosodic variation. Moreover, the generated prosody is solely defined by the input text, which does not allow for different styles for the same sentence. In this work, we train a sequence-to-sequence neural network conditioned on acoustic speech features to… CONTINUE READING
2 Citations
Exemplar-Based Emotive Speech Synthesis
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2021
References
SHOWING 1-10 OF 22 REFERENCES
Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities
- Computer Science
- ArXiv
- 2019
- 8
- Highly Influential
- PDF
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network
- Computer Science, Engineering
- ICML
- 2019
- 28
- PDF
Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis
- Computer Science, Engineering
- ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
- 43
- PDF
Fine-grained robust prosody transfer for single-speaker neural text-to-speech
- Computer Science, Engineering
- INTERSPEECH
- 2019
- 12
- PDF
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
- Computer Science, Engineering
- ICML
- 2018
- 186
- Highly Influential
- PDF
Tacotron: Towards End-to-End Speech Synthesis
- Computer Science
- INTERSPEECH
- 2017
- 645
- Highly Influential
- PDF
Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions
- Computer Science
- 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
- 776
- Highly Influential
- PDF
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
- Computer Science, Engineering
- ICML
- 2018
- 218
- PDF
Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis
- Computer Science, Engineering
- 2018 IEEE Spoken Language Technology Workshop (SLT)
- 2018
- 35
- PDF