• Corpus ID: 8037054

Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis

  title={Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis},
  author={Takayoshi Yoshimura and Keiichi Tokuda and Takashi Masuko and Takao Kobayashi and Tadashi Kitamura},
In this paper, we describe an HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, pitch and state duration are modeled by multi-space probability distribution HMMs and multi-dimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state duration are clustered independently by using a decision-tree based context clustering technique… 

Figures from this paper

Implementation and evaluation of an HMM-based Thai speech synthesis system
The evaluation of the synthesized speech shows that tone correctness is significantly improved in some clustering styles, and the implemented system gives the better reproduction of prosody (or naturalness, in some sense) than the unit-selection-based system with the same speech database.
Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
It is demonstrated that a few sentences uttered by a target speaker are sufficient to adapt not only voice characteristics but also prosodic features, and synthetic speech generated from adapted models using only four sentences is very close to that from speaker dependent models trained using 450 sentences.
Hidden semi-Markov model based speech synthesis
Experimental results show that the use of HSMM training improves the naturalness of the synthesized speech.
An excitation model for HMM-based speech synthesis based on residual modeling
Preliminary results show that the novel excitation model in question eliminates the unnaturalness of synthesized speech, being comparable in quality to the the best approaches thus far reported to eradicate the buzziness of HMM-based synthesizers.
Mixed excitation for HMM-based speech synthesis
Improvements on the excitation model of an HMM-based text-to-speech system is described and the result of a listening test shows that the mixed excite model significantly improves quality of synthesized speech as compared with the traditional excited model.
Speech parameter generation algorithms for HMM-based speech synthesis
This paper derives a speech parameter generation algorithm for HMM-based speech synthesis, in which the speech parameter sequence is generated from HMMs whose observation vector consists of a
HMM-based singing voice synthesis system using pitch-shifted pseudo training data
This paper proposes a technique for training HMMs using pitch-shifted pseudo data and results show that the proposed technique improves the naturalness of the synthesized singing voices.
An introduction of trajectory model into HMM-based speech synthesis
A trajectory-HMM, which has been derived from the HMM under the constraints between static and dynamic features, is introduced into the training part of the H MM-based speech synthesis system and Experimental results show that the use of trajectory- HMM training improves the quality of the synthesized speech.
This paper proposes a state duration modeling method using full covariance matrix for HMM-based speech synthesis. In this method, a full covariance matrix instead of the conventional diagonal
HMM-based speech synthesis and its applications
This thesis describes a novel approach to text-to-speech synthesis (TTS) based on hidden Markov model (HMM) and shows that by using this extended HMM, referred to as the multi-space probability distribution HMM (MSD-HMM), spectral parameter sequences and F0 patterns can be modeled and generated in a unified framework of HMM.


Duration modeling for HMM-based speech synthesis
This paper takes account of contextual factors such as stressrelated factors and locational factors in addition to phone identity factors to synthesize good quality speech with natural timing and the speaking rate can be varied easily.
Speaker adaptation for HMM-based speech synthesis system using MLLR
From the results of objective and subjective tests, it is shown that the characteristics of synthetic speech is close to target speaker’s voice, and the speech generated from the adapted model set using 5 sentences has almost the same DMOS score as that from the speaker dependent model set.
An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features
This paper proposes an algorithm for speech parameter generation from continuous mixture HMMs which include dynamic features, i.e., delta and delta-delta parameters of speech, and derives a fast algorithm on the analogy of the RLS algorithm for adaptive ltering.
Speech synthesis using HMMs with dynamic features
A new text-to-speech synthesis system based on HMM which includes dynamic features, i.e., delta and delta-delta parameters of speech, which becomes quite smooth and natural even if the number of clustered states is small.
HMM-based smoothing for concatenative speech synthesis
An advanced smoothing system is developed that a small pilot study indicates significantly improves quality of the Whistler Text-to-Speech engine and is demonstrated to be robust by maintaining improved quality with a significant reduction in data.
The IBM trainable speech synthesis system
The speech synthesis system described in this paper uses a set of speaker-dependent decision-tree state-clustered hidden Markov models to automatically generate a leaf level segmentation of a large
Speaker interpolation in HMM-based speech synthesis system
An approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system using speaker interpolation, which can synthesize speech with various voice quality without large database in synthesis phase.
Hidden Markov models based on multi-space probability distribution for pitch pattern modeling
A hidden Markov model based on multi-space probability distribution (MSD) can model pitch patterns without heuristic assumption and a reestimation algorithm is derived that can find a critical point of the likelihood function.
Continuously variable duration hidden Markov models for speech analysis
  • S. Levinson
  • Mathematics
    ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 1986
The solution proposed here is to replace the probability distributions of duration with continuous probability density functions to form a continuously variable duration hidden Markov model (CVDHMM) which is ideally suited to specification of the durational density.
Speaker adaptation with autonomous model complexity control by MDL principle
  • Koichi Shinoda, Takao Watanabe
  • Computer Science
    1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings
  • 1996
A speaker adaptation method for continuous density HMMs, which performs well for any amount of data for adaptation, is proposed, and needs no control parameters for selecting models.