Doo Hwa Hong

Learn More
It is generally known that a well-designed excitation produces high quality signals in hidden Markov model (HMM)-based speech synthesis systems. This paper proposes a novel techniques for generating excitation based on the waveform interpolation (WI). For modeling WI parameters, we implemented statistical method like principal component analysis (PCA). The(More)
SUMMARY In our previous study, we proposed the waveform interpolation (WI) approach to model the excitation signals for hidden Markov model (HMM)-based speech synthesis. This letter presents several techniques to improve excitation modeling within the WI framework. We propose both the time domain and frequency domain zero padding techniques to reduce the(More)
Signals originated from the same speech source usually appear differently depending on a variety of acoustic effects such as the background noises, linear or nonlinear distortions incurred by the recording devices or reverberations. These acoustical effects result in mismatches between the trained speech recognition models and the input speech. One of the(More)
In order to express natural prosodic variations in continuous speech, sophisticated speech units such as the context-dependent phone models are usually employed in HMM-based speech synthesis techniques. Since the training database cannot practically cover all possible context factors, decision tree-based HMM states clustering is commonly applied. One of the(More)
One of the most popular approaches to parameter adaptation in hidden Markov model (HMM) based systems is the maximum likelihood linear regression (MLLR) technique. In our previous work, we proposed factored MLLR (FMLLR) where an MLLR parameter is defined as a function of a control parameter vector. We presented a method to train the FMLLR parameters based(More)
The performance of a speech recognition system may be degraded even without any background noise because of the linear or non-linear distortions incurred by recording devices or reverberations. One of the well-known approaches to reduce this channel distortion is feature mapping which maps the distorted speech feature to its clean counterpart. The feature(More)
—Speech synthesized from the same text should sound differently depending on the speaking style. Current speech synthesis techniques based on the hidden Markov model (HMM) usually focus on a fixed speaking style and changing the speaking style requires a variety of sets of parameters trained in different speaking styles. A promising alternative is to adapt(More)
In our previous work, we proposed factored maximum likelihood linear regression (FMLLR) adaptation where each MLLR parameter is defined as a function of a control vector. In this paper, we introduce a novel technique called factored maximum likelihood kernelized regression (FMLKR) for HMM-based style adaptive speech synthesis. In FMLKR, nonlinear regression(More)