Automatic Segmentation Combining and Spectral Boundary


Currently, AT&T Labs’ Natural Voices multilingual TTS system produces high-quality synthetic speech with a largescale speech corpus [1]. In the development of such systems, automatic segmentation constitutes a major component technology. The prevalent approach for automatic segmentation in speech synthesis is Hidden Markov Model (HMM) based. Even though an HMM-based approach is the most automatic and reliable, there are still several limitations, such as mismatches between hand-labeled transcriptions and HMM alignment labels which can lead to discontinuities in the synthetic speech, or the need for hand-labeled bootstrap data in HMM initialization. This paper introduces a new approach to automatic segmentation which aims both to minimize human intervention and to achieve a higher segmental quality of synthetic speech in unit-concatenative speech synthesis, by combining a conventional HMM-based approach and spectral boundary correction. A preference test demonstrates the proposed method is effective in reducing discontinuities in synthetic speech.

Extracted Key Phrases

3 Figures and Tables

Cite this paper

@inproceedings{Kim2002AutomaticSC, title={Automatic Segmentation Combining and Spectral Boundary}, author={Yeon-Jun Kim}, year={2002} }