Creating Simplified Version of Lip Database based on Front View of Face

  title={Creating Simplified Version of Lip Database based on Front View of Face},
  author={Ritesh A. Magre and Ajit S. Ghodke},
  journal={International Journal of Computer Applications},
Recently lot of work has been done on audio visual speech recognition but less work has been done on visual speech and speaker recognition. This research belongs to human computer interaction (HCI) domain. HCI makes human computer interaction simple. This paper represents the creating of database of visual speech and speaker in English language and preprocessing of it to improve recognition accuracy. We have studied Tulipse1 database, AV Database and CUAVE Database on the basis of these… 

Tables from this paper



Design and Recording of Czech Audio-Visual Database with Impaired Conditions for Continuous Speech Recognition

Database introduced in this paper can be used for testing of visual parameterization in audio-visual speech recognition (AVSR).

Audio-visual speech recognition for difficult environments

Overall, the addition of visual features is shown to improve upon audio-only performance in noisy and multispeaker environments, and techniques are presented that yield improved speech-reading performance for moving talkers.

Improving connected letter recognition by lipreading

The authors show how recognition performance in automated speech perception can be significantly improved by additional lipreading, so called speech-reading. They show this on an extension of a

Automatic lipreading to enhance speech recognition (speech reading)

An automatic lipreading system which has been developed and the combination of the acoustic and visual recognition candidates is shown to yield a final recognition accuracy which greatly exceeds the acoustic recognition accuracy alone.

Features for Audio-Visual Speech Recognition

Five new lipreading techniques are evaluated using a hidden Markov model based visual-only recognition task and compared with an enhanced implementation of a previous lip contour tracker, finding the addition of visual information to automatic speech recognition is found to improve accuracy and is most pronounced in acoustically noisy conditions.

Lipreading Using Shape, Shading and Scale

A nontracked alternative is a nonlinear transform of the image using a multiscale spatial analysis (MSA), which performs almost identically to AAM’s in both visual and audio-visual recognition tasks on a multi-talker database of isolated letters.

Real-time lip tracking and bimodal continuous speech recognition

The experiments show that the bimodal recognizer compares favorably to the acoustic-only counterpart, and the results indicate that it is advantageous to include first derivatives of the visual features.

Visual Speech Recognition with Stochastic Networks

The results indicate that simple hidden Markov models may be used to successfully recognize relatively unprocessed image sequences, and the system achieved performance levels equivalent to untrained humans when asked to recognize the first four English digits.

On the Integration of Auditory and Visual Parameters in an HMM-based ASR

A model which can improve the performances of an audio-visual speech recognizer in an isolated word and speaker dependent situation is proposed by using a hybrid system based on two HMMs trained respectively with acoustic and optic data.

Robust speech recognition and feature extraction using HMM2