• Corpus ID: 13287586

Estimating speech from lip dynamics

@article{George2017EstimatingSF,
  title={Estimating speech from lip dynamics},
  author={Jithin Donny George and Ronan Keane and Conor Zellmer},
  journal={ArXiv},
  year={2017},
  volume={abs/1708.01198}
}
The goal of this project is to develop a limited lip reading algorithm for a subset of the English language. We consider a scenario in which no audio information is available. The raw video is processed and the position of the lips in each frame is extracted. We then prepare the lip data for processing and classify the lips into visemes and phonemes. Hidden Markov Models are used to predict the words the speaker is saying based on the sequences of classified phonemes and visemes. The GRID… 
1 Citations

Figures and Tables from this paper

LipVision: A Deep Learning Approach

The paper is conducting a survey on the previously done work on Lip-Reading, discussing the different classifiers used, their efficiency and the end accuracy obtained.

References

SHOWING 1-9 OF 9 REFERENCES

Phoneme-to-viseme Mapping for Visual Speech Recognition

These initial experiments demonstrate that the choice of visual unit requires more careful attention in audio-visual speech recognition system development, and the best visual-only recognition on the VidTIMIT database is achieved using a linguistically motivated viseme set.

Visual Words for Automatic Lip-Reading

This thesis investigates various issues faced by an automated lip-reading system and proposes a novel "visual words" based approach to automatic lip reading, which includes a novel automatic face localisation scheme and a lip localisation method.

Acoustic driven viseme identification for face animation

  • J. ZhongW. ChouE. Petajan
  • Computer Science
    Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing
  • 1997
An approach of extracting visemes from both image and acoustic domains is presented and the mouth shapes, represented by feature points on inner lip contours, are extracted through face tracking and mouth image analysis.

Baum-Welch hidden Markov model inversion for reliable audio-to-visual conversion

The hidden Markov model inversion (HMMI) technique introduced for robust speech recognition is extended in this paper into the audio-visual feature space and preliminary simulation results show that the estimated visual parameters from the proposed method match the true visual parameters smoothly as well as accurately.

Visual Recognition of American Sign Language Using Hidden Markov Models.

Using hidden Markov models (HMM's), an unobstrusive single view camera system is developed that can recognize hand gestures, namely, a subset of American Sign Language (ASL), achieving high recognition rates for full sentence ASL using only visual cues.

Hidden Markov Model for Gesture Recognition

The proposed method is applicable to any gesture represented by a multi- dimensional signal, and will be a valuable tool in telerobotics and human computer interfaces.

The Visual Microphone: Passive Recovery of Sound from Video

The input and recovered sounds for all of the experiments in the paper the project web page. Abstract When sound hits an object, it causes small vibrations of the ob- ject’s surface. We show how,

A tutorial on hidden Markov models and selected applications in speech recognition

The fabric comprises a novel type of netting which will have particular utility in screening out mosquitoes and like insects and pests. The fabric is defined of voids having depth as well as width

Department of Electronic and Electrical Engineering