• Publications
  • Influence
WaveNet: A Generative Model for Raw Audio
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition. Expand
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previousExpand
Statistical parametric speech synthesis using deep neural networks
This paper examines an alternative scheme that is based on a deep neural network (DNN), the relationship between input texts and their acoustic realizations is modeled by a DNN, and experimental results show that the DNN- based systems outperformed the HMM-based systems with similar numbers of parameters. Expand
The HMM-based speech synthesis system (HTS) version 2.0
This paper describes HTS version 2.0 in detail, as well as future release plans, which include a number of new features which are useful for both speech synthesis researchers and developers. Expand
Statistical Parametric Speech Synthesis
This paper gives a general overview of techniques in statistical parametric speech synthesis, and contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years. Expand
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Experimental results show that neural end-to-end TTS models trained from the LibriTTS corpus achieved above 4.0 in mean opinion scores in naturalness in five out of six evaluation speakers. Expand
Speech Synthesis Based on Hidden Markov Models
This paper gives a general overview of hidden Markov model (HMM)-based speech synthesis, which has recently been demonstrated to be very effective in synthesizing speech. The main advantage of thisExpand
Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
  • H. Zen, H. Sak
  • Computer Science
  • IEEE International Conference on Acoustics…
  • 19 April 2015
Experimental results in subjective listening tests show that the proposed architecture can synthesize natural sounding speech without requiring utterance-level batch processing. Expand
A Hidden Semi-Markov Model-Based Speech Synthesis System
Subjective listening test results show that use of HSMMs improves the reported naturalness of synthesized Speech Synthesis, which can be viewed as an HMM with explicit state duration PDFs. Expand