A family-of-models approach to HMM-based segmentation for unit selection speech synthesis

Abstract

For segmenting a speech database, using a family of acoustic models provides multiple estimates of each boundary point. This is more robust than a single estimate because by taking consensus values, large labeling errors are less prevalent in the synthesis catalog, which improves the resulting voice. This paper describes HMM-based segmentation in which up to 500 related models are applied to each wavefile. In a listening test of twelve utterances, human judges preferred the proposed technique over the baseline by a tally of 6 to 2, with 4 ties.

Extracted Key Phrases

8 Figures and Tables

Cite this paper

@inproceedings{Kominek2004AFA, title={A family-of-models approach to HMM-based segmentation for unit selection speech synthesis}, author={John Kominek and Alan W. Black}, booktitle={INTERSPEECH}, year={2004} }