Corpus ID: 218763127

Pitchtron: Towards audiobook generation from ordinary people's voices

@article{Jung2020PitchtronTA,
  title={Pitchtron: Towards audiobook generation from ordinary people's voices},
  author={Sunghee Jung and Hoirin Kim},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.10456}
}
  • Sunghee Jung, Hoirin Kim
  • Published 2020
  • Computer Science, Engineering
  • ArXiv
  • In this paper, we explore prosody transfer for audiobook generation under rather realistic condition where training DB is plain audio mostly from multiple ordinary people and reference audio given during inference is from professional and richer in prosody than training DB. To be specific, we explore transferring Korean dialects and emotive speech even though training set is mostly composed of standard and neutral Korean. We found that under this setting, original global style token method… CONTINUE READING

    Figures, Tables, and Topics from this paper.

    Explore Further: Topics Discussed in This Paper

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 21 REFERENCES

    Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language

    VIEW 1 EXCERPT

    Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis

    • Younggun Lee, Taesu Kim
    • Computer Science, Engineering
    • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2019
    VIEW 1 EXCERPT