Visual Speech Recognition

@article{Hassanat2014VisualSR,
  title={Visual Speech Recognition},
  author={Ahmad Basheer Hassanat},
  journal={ArXiv},
  year={2014},
  volume={abs/1409.1411}
}
Lip reading is used to understand or interpret speech without hearing it, a technique especially mastered by people with hearing difficulties. The ability to lip read enables a person with a hearing impairment to communicate with others and to engage in social activities, which otherwise would be difficult. Recent advances in the fields of computer vision, pattern recognition, and signal processing has led to a growing interest in automating this challenging task of lip reading. Indeed… Expand
Lipreading using a comparative machine learning approach
TLDR
This paper presents a detailed study of the machine learning approach for the real-time visual recognition of spoken words and nine different classifiers have been implemented and tested, reporting their confusion matrices among different groups of words. Expand
Enhancing quality and accuracy of speech recognition system by using multimodal audio-visual speech signal
TLDR
The experimental results show that the proposed Audio Video Speech Recognizer (AV-ASR) system exhibits higher recognition rate in comparison to an audio-only recognizer as well as it indicates robust performance. Expand
A survey of automatic lip reading approaches
  • W. Butt, L. Lombardi
  • Computer Science
  • Eighth International Conference on Digital Information Management (ICDIM 2013)
  • 2013
TLDR
The recognition process of common visual features for an improved lip reading will be shown, based on shape and appearance descriptions and detailed information about the automatic lip reading system is described. Expand
Large-Scale Visual Speech Recognition
TLDR
This work designed and trained an integrated lipreading system, consisting of a video processing pipeline that maps raw video to stable videos of lips and sequences of phonemes, a scalable deep neural network that maps the lip videos to sequence of phoneme distributions, and a production-level speech decoder that outputs sequences of words. Expand
Syllable-Based Indonesian Lip Reading Model
  • Adriana Kurniawan, S. Suyanto
  • Computer Science
  • 2020 8th International Conference on Information and Communication Technology (ICoICT)
  • 2020
TLDR
A syllable-based model is proposed in this research to handle the problem of out-of-vocabulary (OOV) in lip reading, which gives a chance to build a new word that does not appear in the dictionary. Expand
A three-dimensional approach to Visual Speech Recognition using Discrete Cosine Transforms
TLDR
This paper is the first to show that 3D feature extraction methods can be applied to continuous sequence recognition tasks despite the unknown start positions and durations of each phoneme, and confirms that3D feature extracted methods improve the accuracy compared to 2D features extraction methods. Expand
DCT-based Visual Feature Extraction for Indonesian Audiovisual Speech Recognition
TLDR
The development of a syllable-based Indonesian AVSR system (INAVSR) using the fusion of both audio and visual features is described, capable of absolutely reducing the word error rate (WER) produced by the audio-based ASR by up to 6.07%. Expand
Noise-Robust Speech Recognition System based on Multimodal Audio-Visual Approach Using Different Deep Learning Classification Techniques
TLDR
Improved off traditional HMM-based Automatic Speech Recognition (ASR) accuracy is achieved by implementing a technique using either RNN-based or CNN-based approach. Expand
Lip Reading using Neural Network and Deep learning
Lip reading is a technique to understand words or speech by visual interpretation of face, mouth, and lip movement without the involvement of audio. This task is difficult as people use differentExpand
Lip Detection and Lip Geometric Feature Extraction using Constrained Local Model for Spoken Language Identification using Visual Speech Recognition
TLDR
This paper presents the methodology for detecting lips from face images using constrained local model (CLM) and then extracting the geometric features of lip shape and indicates the speaker dependency of visual speech recognition systems. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 51 REFERENCES
Visual Words for Automatic Lip-Reading
TLDR
This thesis investigates various issues faced by an automated lip-reading system and proposes a novel "visual words" based approach to automatic lip reading, which includes a novel automatic face localisation scheme and a lip localisation method. Expand
Lip Localization and Viseme Classification for Visual Speech Recognition
TLDR
A new approach to automatically localize lip feature points in a speaker’s face and to carry out a spatial-temporal tracking of these points and the extracted visual information is classified in order to recognize the uttered viseme (visual phoneme). Expand
The application of manifold based visual speech units for visual speech recognition
TLDR
A large section of this thesis has been dedicated to analysis the performance of the new visual speech unit model when compared with that attained for standard (MPEG-4) viseme models. Expand
Large-vocabulary audio-visual speech recognition by machines and humans
TLDR
Automatic audio-visual recognition outperforms human audioonly speech perception at low SNRs, and gains significantly diverge at other SNRs. Expand
A visual front-end for a continuous pose-invariant lipreading system
  • P. Lucey, S. Sridharan
  • Computer Science
  • 2008 2nd International Conference on Signal Processing and Communication Systems
  • 2008
TLDR
This paper describes a visual front-end which incorporates a pose-estimator in conjunction with a parallel series of pose specific face and facial feature classifier based on a boosted cascade of simple classifiers devised by Viola and Jones. Expand
Voiceless speech recognition using dynamic visual speech features
TLDR
A voiceless speech recognition technique that utilizes dynamic visual features to represent the facial movements during phonation that is suitable for recognition of English consonants. Expand
Mutual information eigenlips for audio-visual speech recognition
TLDR
This paper proposes an application of information theoretic approach for finding the most informative subset of eigen-features to be used for audio-visual speech recognition tasks and applies the proposed method such that only those cues having the highest mutual information with word classes are retained. Expand
"Eigenlips" for robust speech recognition
  • C. Bregler, Y. Konig
  • Computer Science
  • Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing
  • 1994
TLDR
This study improves the performance of a hybrid connectionist speech recognition system by incorporating visual information about the corresponding lip movements by using a new visual front end, and an alternative architecture for combining the visual and acoustic information. Expand
Appearance Feature Extraction versus Image Transform-Based Approach for Visual Speech Recognition
TLDR
This paper investigates the HCM performance to achieve feature extraction and classification and then compares the performance when replacing HCM with Fast Discrete Cosine Transform (FDCT), showing that HCM is generally better than FDCT and provides a good distribution of the phonemes in the feature space for recognition purposes. Expand
Audio-visual speech recognition with a hybrid SVM-HMM system
TLDR
This work proposes a hybrid SVM-HMM speech recognizer, and shows how the multimodal approach leads to better performance than that obtained with any of the two modalities individually. Expand
...
1
2
3
4
5
...