Learn More
This paper studies a deep neural network (DNN) based voice source modelling method in the synthesis of speech with varying vocal effort. The new trainable voice source model learns a mapping between the acoustic features and the time-domain pitch-synchronous glottal flow waveform using a DNN. The voice source model is trained with various speech material(More)
Achieving high quality and naturalness in statistical parametric synthesis of female voices remains to be difficult despite recent advances in the study area. Vocoding is one such key element in all statistical speech synthesizers that is known to affect the synthesis quality and naturalness. The present study focuses on a special type of vocoding, glottal(More)
While the characteristics of the amplitude spectrum of the voiced excitation have been studied widely both in natural and synthetic speech, the role of the excitation phase has remained less explored. Especially in speech synthesis, the phase information is often omitted for simplicity. This study investigates the impact of phase information of the(More)
GlottHMM is a previously developed vocoder that has been successfully used in HMM-based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according to the functioning of the real human voice production mechanism. In this study, a new glottal vocoding method, GlottDNN, is proposed. The GlottDNN vocoder is built on the principles(More)
This work studies the use of deep learning methods to directly model glottal excitation waveforms from context dependent text features in a text-to-speech synthesis system. Glottal vocoding is integrated into a deep neural network-based text-to-speech framework where text and acoustic features can be flexibly used as both network inputs or outputs. Long(More)
This study presents an automatic glottal inverse filtering (GIF) technique based on separating the effect of the glottal main excitation from the impulse response of the vocal tract. The proposed method is based on a non-negative matrix factorization (NMF) based decomposition of an ultra short-term spectrogram of the analyzed signal. Unlike other(More)
The composite autoregressive system can be used to estimate a speech source-filter decomposition in a rigorous manner, thus having potential use in glottal inverse filtering. By introducing a suitable prior, spectral tilt can be introduced into the source component estimation to better correspond to human voice production. However, the current(More)