Keiichiro Oura

Learn More
In conventional speech synthesis, large amounts of phonetically balanced speech data recorded in highly controlled recording studio environments are typically required to build a voice. Although using such data is a straightforward solution for high quality synthesis, the number of voices available will always be limited, because recording costs are high.(More)
A statistical parametric approach to speech synthesis based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach, spectrum, excitation, and duration of speech are simultaneously modeled by context-dependent HMMs, and speech waveforms are generated from the HMMs themselves. Since December 2002, we have publicly(More)
In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). An important application(More)
A statistical parametric approach to singing voice synthesis based on hidden Markov Models (HMMs) has been grown over the last few years. The spectrum, excitation, and duration of singing voices in this approach are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. In December 2009, we started a free(More)
This paper investigates how to use neural networks in statistical parametric speech synthesis. Recently, deep neural networks (DNNs) have been used for statistical parametric speech synthesis. However, the specific way how DNNs should be used in statistical parametric speech synthesis has not been studied thoroughly. A generation process of statistical(More)
In hidden Markov models (HMMs), state duration probabilities decrease exponentially with time. It would be an inappropriate representation of temporal structure of speech. One of the solutions for this problem is integrating state duration probability distributions explicitly into the HMM. This form is known as a hidden semi-Markov model (HSMM). Although a(More)
Our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an ‘average voice model’ plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack of phonetic(More)