Learn More
—This paper describes a speech recognition system that uses both acoustic and visual speech information to improve the recognition performance in noisy environments. The system consists of three components: 1) a visual module; 2) an acoustic module; and 3) a sensor fusion module. The visual module locates and tracks the lip movements of a given speaker and(More)
This paper describes a multimodal approach for speaker verification. The system consists of two classifiers, one using visual features, the other using acoustic features. A lip tracker is used to extract visual information from the speaking face which provides shape and intensity features. We describe an approach for normalizing and mapping different(More)
This paper describes a novel approach for visual speech recognition. The shape of the mouth is modelled by an Active Shape Model which is derived from the statistics of a training set and used to locate, track and parameterise the speaker's lip movements. The extracted parameters representing the lip shape are modelled as continuous probability(More)
This paper addresses the problem of audiovisual information fusion to provide highly robust speech recognition. We investigate methods that make different assumptions about asynchrony and conditional dependence across streams and propose a technique based on composite HMMs that can account for stream asynchrony and different levels of information(More)