Automatic Segmentation and Labeling for Mandarin Chinese Speech Corpus for Concatenation-based TTS

Abstract

Corpus for Concatenation-based TTS Cheng-Yuan Lin, Jyh-Shing Roger Jang, Kuan-Ting Chen Multimedia Information Retrieval Laboratory Dept. of Computer Science National Tsing Hua University HsingChu, Taiwan +88635715131-3506 {gavins, jang, marco}@wayne.cs.nthu.edu.tw ABSTRACT Precise phone/syllable boundary labeling of utterances in a speech corpus plays an important role in constructing corpus-based TTS (text-to-speech) systems. However, automatic labeling based on Viterbi forced alignment does not always produce satisfactory results. Moreover, a suitable labeling method for one language does not necessarily produce desirable results for another language. Hence in this paper, we propose the design of a new procedure to refine the boundaries in a Mandarin speech corpus. This procedure employs different sets of acoustic features for four different phonetic categories. In addition, a new scheme is designed to deal with the case of “periodic voiced + periodic voiced” which produces most of the segmentation errors in our experiment. Several experiments are designed to demonstrate the feasibility of the proposed approach.

Extracted Key Phrases

12 Figures and Tables

Cite this paper

@inproceedings{Lin2005AutomaticSA, title={Automatic Segmentation and Labeling for Mandarin Chinese Speech Corpus for Concatenation-based TTS}, author={Cheng-Yuan Lin and Jyh-Shing Roger Jang and Kuan-Ting Chen}, year={2005} }