Emphatic Speech Prosody Prediction with Deep Lstm Networks

@article{Shechtman2018EmphaticSP,
  title={Emphatic Speech Prosody Prediction with Deep Lstm Networks},
  author={Slava Shechtman and Moran Mordechay},
  journal={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2018},
  pages={5119-5123}
}
Controllable generation of emphasis in speech is desirable for expressive TTS systems utilized in various dialog applications. Usually such models remain voice-specific and the strength of emphasis can't be readily controlled. In this work we present a flexible emphatic prosody generation model based on Deep Recurrent Neural Networks (DRNN) for controllable word-level emphasis realization. The word emphasis DRNN model was trained on syllable-level piecewise linear prosodic trajectory parameters… CONTINUE READING

Citations

Publications citing this paper.
Showing 1-2 of 2 extracted citations

References

Publications referenced by this paper.
Showing 1-10 of 14 references

Szczepaniak : " Fast , compact , and high quality LSTM - RNN based statistical parametric speech synthesizers for mobile devices . "

N. Egberts, F. Henderson
In Interspeech • 2016

and R . Hoory : " Using deep bidirectional recurrent neural networks for prosodic - target prediction in a unit - selection text - to - speech system . "

A. Rendel Fernandez, B. Ramabhadran
In Interspeech • 2015

CROWDMOS: An approach for crowdsourcing mean opinion score studies

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2011

Word-level emphasis modelling in HMM-based speech synthesis

2010 IEEE International Conference on Acoustics, Speech and Signal Processing • 2010

Young: "Word-level Emphasis Modelling in HMM-based Speech Synthesis.

K. Yu, S. F. Mairesse
Proc. ICASSP-2010, • 2010
View 2 Excerpts

Similar Papers

Loading similar papers…