Learn More
—This paper describes a speech recognition system that uses both acoustic and visual speech information to improve the recognition performance in noisy environments. The system consists of three components: 1) a visual module; 2) an acoustic module; and 3) a sensor fusion module. The visual module locates and tracks the lip movements of a given speaker and(More)
This paper addresses the problem of audiovisual information fusion to provide highly robust speech recognition. We investigate methods that make different assumptions about asynchrony and conditional dependence across streams and propose a technique based on composite HMMs that can account for stream asynchrony and different levels of information(More)
This paper describes a multimodal approach for speaker verification. The system consists of two classifiers, one using visual features, the other using acoustic features. A lip tracker is used to extract visual information from the speaking face which provides shape and intensity features. We describe an approach for normalizing and mapping different(More)
This paper describes a novel approach for visual speech recognition. The shape of the mouth is modelled by an Active Shape Model which is derived from the statistics of a training set and used to locate, track and parameterise the speaker's lip movements. The extracted parameters representing the lip shape are modelled as continuous probability(More)
In this work we demonstrate an improvement in the state-of-the-art large vocabulary continuous speech recognition (LVCSR) performance , under clean and noisy conditions, by the use of visual information, in addition to the traditional audio one. We take a decision fusion approach for the audiovisual information, where the single-modality (audio-and(More)
We describe a speechreading system that uses both, shape information from the lip contours and intensity information from the mouth area. Shape information is obtained by tracking and parameterising the inner and outer lip boundary in an image sequence. Intensity information is extracted from a grey level model, based on principal component analysis. In(More)