Modeling segmental durations for Japanese text-to-speech synthesis

Abstract

Accurate estimation of segmental durations is crucial for naturalsounding text-to-speech (TTS) synthesis. This paper presents a model of segmental duration used in the Bell Labs Japanese TTS system. We describe the constraints on vowel devoicing, and effects of factors such as phone identity, surrounding phone identities, accentuation, syllabic structure… (More)

10 Figures and Tables

Topics

  • Presentations referencing similar topics