Learn More
In the field of audio-visual speech recognition, multi-stream HMMs are widely used, thus how to automatically and properly determine stream weight factors using a small data set becomes an important research issue. This paper proposes a new stream-weight optimization method based on an output likelihood normalization criterion. In this method, the stream(More)
In this paper we present a new method for synthesizing multiple languages with the same voice, using HMM-based speech synthesis. Our approach, which we call HMM-based polyglot synthesis, consists of mixing speech data from several speakers in different languages, to create a speakerand language-independent (SI) acoustic model. We then adapt the resulting SI(More)
This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and(More)
For multi-stream HMM that are widely used in audio-visual speech recognition, it is important to automatically and properly adjust stream weights. This paper proposes a stream-weight optimization technique based on a likelihood-ratio maximization criterion. In our audiovisual speech recognition system, video signals are captured and converted into visual(More)
A new method for prosodic word boundary detection in continuous speech was developed based on the statistical modeling of moraic transitions of fundamental frequency (F 0 ) contours, formerly proposed by the authors. In the developed method, F 0 contours of prosodic words were modeled separately according to the accent types. An input utterance was matched(More)