Speech Synthesis Based on Hidden Markov Models

@article{Tokuda2013SpeechSB,
  title={Speech Synthesis Based on Hidden Markov Models},
  author={Keiichi Tokuda and Yoshihiko Nankaku and Tomoki Toda and Heiga Zen and Junichi Yamagishi and Keiichiro Oura},
  journal={Proceedings of the IEEE},
  year={2013},
  volume={101},
  pages={1234-1252}
}
This paper gives a general overview of hidden Markov model (HMM)-based speech synthesis, which has recently been demonstrated to be very effective in synthesizing speech. The main advantage of this approach is its flexibility in changing speaker identities, emotions, and speaking styles. This paper also discusses the relation between the HMM-based approach and the more conventional unit-selection approach that has dominated over the last decades. Finally, advanced techniques for future… 
SPEECH SYNTHESIS USING HIDDEN MARKOV MODEL AND APPLICATION OF VOICE
TLDR
A general overview of techniques used in speech synthesis is given, called hidden Markov model (HMM) based speech synthesis, which has recently been demonstrated to be very effective in synthesizing acceptable speech.
Speech Synthesis Based on Hidden Markov Models and Deep Learning
TLDR
The results indicate that HMM-voices can be improved using this approach in its spectral characteristics, but additional research should be conducted to improve other parameters of the voice signal, such as energy and fundamental frequency, to obtain more natural sounding voices.
HMM speech synthesis based on MDCT representation
TLDR
An HMM speech synthesis technique based on the modified discrete cosine transform (MDCT) representation that guarantees a perfect reconstruction of the signal frame from feature vectors and allows for a 50% overlap between frames without increasing the data vector is presented.
Excitation Modeling Method Based on Inverse Filtering for HMM-Based Speech Synthesis
TLDR
A novel excitation modeling approach for HMM-based speech synthesis system (HTS) where the excitation signal obtained via inverse filtering is parameterized into excitation features, which are modeled using HMMs.
Investigating the shortcomings of HMM synthesis
TLDR
A framework for formal testing of the causes of the current limited quality of HMM (Hidden Markov Model) speech synthesis is presented and the future improvements to be made to the framework will finally be discussed which include the extension to more of the parameters modelled during speech synthesis.
HMM-based Speech Synthesizer for Easily Understandable Speech Broadcasting
TLDR
A Hidden Markov Model (HMM) based speech synthesis system dependent on the input speaker’s speech was implemented, and it was found that speech synthesized by learning 50 sentences exhibited sufficient performance.
Quality Assessment of HMM-Based Speech Synthesis Using Acoustical Vowel Analysis
TLDR
This paper considers nine acoustic parameters, related to jitter and shimmer, and considers their statistical significance as objective measurements of synthetic speech quality.
HIDDEN MARKOV MODEL BASED SPEECH SYNTHESIS SYSTEM IN SLOVAK LANGUAGE WITH SPEAKER INTERPOLATION
TLDR
The interpolation provides an approach to voice characteristic conversion for hidden Markov model based text-to-speech synthesis system and allows to create new voices without the need to add additional data into training procedure, especially for the low resources languages, such as Slovak language.
Learning HMM State Sequences from Phonemes for Speech Synthesis
TLDR
A technique for learning hidden Markov model (HMM) state sequences from phonemes, that combined with modified discrete cosine transform (MDCT), is useful for speech synthesis and the quality of synthesized speech is conveniently evaluated using the well known Itakura-Saito measure.
A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models
TLDR
The proposed method utilizes individual speech features, and its formulation is the same as that of conventional GMMbased VC, it makes it possible to produce high-quality speech while keeping flexibility of the original GMM-based VC.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 165 REFERENCES
Unifying unit selection and hidden Markov model speech synthesis
TLDR
This paper presents a framework which can accommodate the two most widely used contemporary speech synthesis techniques, namely unit selection and hidden Markov models, by building a very general HMM where a network of states exactly mimics the behaviour of a unit selection system.
Mixed excitation for HMM-based speech synthesis
TLDR
Improvements on the excitation model of an HMM-based text-to-speech system is described and the result of a listening test shows that the mixed excite model significantly improves quality of synthesized speech as compared with the traditional excited model.
Duration modeling for HMM-based speech synthesis
TLDR
This paper takes account of contextual factors such as stressrelated factors and locational factors in addition to phone identity factors to synthesize good quality speech with natural timing and the speaking rate can be varied easily.
An excitation model for HMM-based speech synthesis based on residual modeling
TLDR
Preliminary results show that the novel excitation model in question eliminates the unnaturalness of synthesized speech, being comparable in quality to the the best approaches thus far reported to eradicate the buzziness of HMM-based synthesizers.
A Bayesian approach to HMM-based speech synthesis
TLDR
Experimental results show that the proposed method outperforms the conventional one in a subjective test and can be regarded as an application of the variational Bayesian method to the HMM-based speech synthesis.
Factor analyzed voice models for HMM-based speech synthesis
TLDR
This paper proposes a general speech model which generates speech utterances with various voice characteristics directly from the HMM states, and factors representing voice characteristics and contextual decision trees are simultaneously optimized within a unified framework.
A Hidden Semi-Markov Model-Based Speech Synthesis System
TLDR
Subjective listening test results show that use of HSMMs improves the reported naturalness of synthesized Speech Synthesis, which can be viewed as an HMM with explicit state duration PDFs.
An HMM-based speech synthesis system applied to English
This paper describes an HMM-based speech synthesis system (HTS), in which the speech waveform is generated from HMM themselves, and applies it to English speech synthesis using the general speech
Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
TLDR
An HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM is described.
The HMM-based speech synthesis system (HTS) version 2.0
TLDR
This paper describes HTS version 2.0 in detail, as well as future release plans, which include a number of new features which are useful for both speech synthesis researchers and developers.
...
1
2
3
4
5
...