The USTC and iFlytek Speech Synthesis Systems

Abstract

This paper introduces the speech synthesis systems developed by USTC and iFlytek for Blizzard Challenge 2007. These two systems are both HMM-based ones and employ similar training algorithms, where contextual dependent HMMs for spectrum, F0 and duration are estimated according to the acoustic features and contextual information of training database. However, different synthesis methods are adopted for these two systems. In USTC system, speech parameters are generated directly from these statistical models and parametric synthesizer is used to reconstruct speech waveform. The iFlytek system is a waveform concatenation one, which uses maximum likelihood criterion of statistical models to guide the selection of phone-sized candidate units. Comparing the evaluation results of these two systems in Blizzard Challenge 2007, we find that the parametric synthesis system achieves better performance than unit selection method in intelligibility. On the other hand, the synthesized speech of the unit selection system is more similar to the original speech and more natural especially when the full training set is used.

8 Figures and Tables

Statistics

0510152008200920102011201220132014201520162017
Citations per Year

59 Citations

Semantic Scholar estimates that this publication has 59 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Ling2007TheUA, title={The USTC and iFlytek Speech Synthesis Systems}, author={Zhen-Hua Ling and Long Qin and Heng Lu and Yu Gao and Li-Rong Dai and Ren-Hua Wang and Yuan Jiang and Zhiwei Zhao and Jin-hui Yang and Jian Zhi Chen and Guo-Ping Hu}, year={2007} }