Corpus ID: 203597855

A Comparison between Letters and Phones as Input to Sequence-to-Sequence Models for Speech Synthesis

@inproceedings{Taylor2019ACB,
  title={A Comparison between Letters and Phones as Input to Sequence-to-Sequence Models for Speech Synthesis},
  author={J. Taylor},
  year={2019}
}
  • J. Taylor
  • Published 2019
  • Neural sequence-to-sequence (S2S) models for text-tospeech synthesis (TTS) may take letter or phone input sequences. Since for many languages phones have a more direct relationship to the acoustic signal, they lead to improved quality. But generating phone transcriptions from text requires an expensive dictionary and an error-prone grapheme-to-phoneme (G2P) model, and the relative improvement over using letters has yet to be quantified. In approaching this question, we presume that letter-input… CONTINUE READING
    1 Citations

    Figures and Tables from this paper

    Phonological Features for 0-shot Multilingual Speech Synthesis
    • PDF

    References

    SHOWING 1-10 OF 25 REFERENCES
    Analysis of Pronunciation Learning in End-to-End Speech Synthesis
    • 5
    • PDF
    Text-to-Speech Synthesis
    • 281
    • PDF
    Tacotron: Towards End-to-End Speech Synthesis
    • 629
    • PDF
    Neural Machine Translation by Jointly Learning to Align and Translate
    • 14,296
    • PDF
    Representation Mixing for TTS Synthesis
    • 19
    • PDF
    Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions
    • Jonathan Shen, R. Pang, +10 authors Y. Wu
    • Computer Science
    • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2018
    • 749
    • PDF
    Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention
    • 116
    • PDF
    Robust LTS rules with the Combilex speech technology lexicon
    • 48
    • PDF
    Effective Approaches to Attention-based Neural Machine Translation
    • 4,464
    • PDF