Learn More
This paper derives a speech parameter generation algorithm for HMM-based speech synthesis, in which speech parameter sequence is generated from HMMs whose observation vector consists of spectral parameter vector and its dynamic feature vectors. In the algorithm, we assume that the state sequence (state and mixture sequence for the multi-mixture case) or a(More)
A statistical parametric speech synthesis system based on hidden Markov models (HMMs) has grown in popularity over the last few years. This system simultaneously models spectrum, excitation, and duration of speech using context-dependent HMMs and generatesspeech waveforms from the HMMs themselves. Since December 2002, we have publicly released an(More)
In this paper, we describe a novel spectral conversion method for voice conversion (VC). A Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers. The conventional method converts spectral parameters frame by frame based on the minimum mean square error.(More)
This paper gives a general overview of techniques in statistical parametric speech synthesis. One of the instances of these techniques, called HMM-based generation synthesis (or simply HMM-based synthesis), has recently been shown to be very effective in generating acceptable speech synthesis. This paper also contrasts these techniques with the more(More)
In this paper, we describe an HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of H-MM. In the system, pitch and state duration are modeled by multi-space probability distribution HMMs and mul-ti-dimensional Gaussian distributions, respectively. The distributions for spectral(More)
This paper discusses a hidden Markov model (HMM) based on multi-space probability distribution (MSD). The HMMs are widely-used statistical models to characterize the sequence of speech spectra and have successfully been applied to speech recognition systems. From these facts, it is considered that the HMM is useful for modeling pitch patterns of speech.(More)
This paper describes an HMM-based speech synthesis system (HTS), in which speech waveform is generated from HMMs themselves , and applies it to English speech synthesis using the general speech synthesis architecture of Festival. Similarly to other data-driven speech synthesis approaches, HTS has a compact language dependent module: a list of contextual(More)
This paper describes a novel parameter generation algorithm for the HMM-based speech synthesis. The conventional algorithm generates a trajectory of static features that maximizes an output probability of a parameter sequence consisting of the static and dynamic features from HMMs under an actual constraint between the two features. The generated trajectory(More)
The generalized cepstral analysis method is viewed as a unified approach to the cepstral method and the linear prediction method, in which the model spectrum varies continuously from all-pole to cepstral according to the value of a parameter γ. Since the human ear has high resolution at low frequencies, introducing similar characteristics to the model(More)