Learn More
The CMU Arctic databases designed for the purpose of speech synthesis research. These single speaker speech databases have been carefully recorded under studio conditions and consist of approximately 1200 phonetically balanced English utterances. In addition to wavefiles, the databases provide complete support for the Festival Speech Synthesis System,(More)
This report introduces the CMU Arctic databases designed for the purpose of speech synthesis research. These single speaker speech databases have been carefully recorded under studio conditions and consist of nearly 1150 phonetically balanced English utterances. They are distributed as free software, without restriction on commercial or non-commercial use.(More)
When developing synthesizers for new languages one must select a phoneset, record phonetically balanced sentences, build up a pronunciation lexicon, and evaluate the results. An objective measure of voice quality can be very useful, provided it is calibrated across multiple speakers, languages, and databases. As a substitute for full listening tests, this(More)
In this paper we describe the design and implementation of a user interface for SPICE, a web-based toolkit for rapid prototyping of speech and language processing components. We report on the challenges and experiences gathered from testing these tools in an advanced graduate hands-on course, in which we created speech recognition, speech synthesis, and(More)
For segmenting a speech database, using a family of acoustic models provides multiple estimates of each boundary point. This is more robust than a single estimate because by taking consensus values, large labeling errors are less prevalent in the synthesis catalog, which improves the resulting voice. This paper describes HMM-based segmentation in which up(More)
The speed with which pronunciation dictionaries can be bootstrapped depends on the efficiency of learning algorithms and on the ordering of words presented to the user. This paper presents an active-learning word selection strategy that is mindful of human limitations. Learning rates approach that of an oracle system that knows the final LTS rule set.
Acknowledging the lessons of Blizzard Challenge 2005 – that smooth prosodic cadence supersedes spectral resolution – but wanting a system devoid of vocoding artifacts – we introduce a hybrid trajectory-selection synthesizer. Using a parametric synthesizer to generate a pitch-synchronous sequence of F0/duration/power and spectral vectors, this trajectory(More)