Hiromichi Kawanami

Learn More
This paper describes simple designing methods of corpus-based visual speech synthesis. Our approach needs only a synchronous real image and speech database. Visual speech is synthesized by concatenating real image segments and speech segments selected from the database. In order to automatically perform all processes, e.g. feature extraction, segment(More)
Voice conversion method is applied to synthesizing emotional speech from standard reading (neutral) speech. Pairs of neutral speech and emotional speech are used for conversion rule training. The conversion adopts GMM (Gaussian Mixture Model) with DFW (Dynamic Frequency Warping). We also adopt STRAIGHT, the high-quality speech analysis-synthesis algorithm.(More)
Investigation was conducted on how prosodic features of emotional speech changed depending on emotion levels. The analysis results on fundamental frequency (F 0) contours and speech rates implied that humans have several ways to express emotions and use them rather randomly. Investigation was also conducted on what acoustic features were important to(More)
Through the analyses of fundamental frequency contours and speech rates of dialogue speech and also of read speech, prosodic rules were derived for the synthesis of spoken dialogue. As for the fundamental frequency contours, they were rst decomposed into phrase and accent components based on the superpositional model, and then their command(More)
This paper presents our new approach to model tone coarticulation of Chinese continuous speech for tone recognition. We suggest that coarticulation effects between two neighboring tones are rather unstable, since they may be uni-directional, bi-directional, or none despite of the same phonetic contexts. Instability is suggested due to non-local prosodic(More)
Our reasearch goal is to construct a Japanese TTS (Text-to-Speech) system that can output various kinds of prosody. Since such synthetic speech is useful for a practical use, many TTS systems have implemented global prosodic control processing. But fundamentally they're designed to output speech with standard pitch and speech rate. We discuss synthesis(More)