• Computer Science
  • Published in INTERSPEECH 2002

DCT-based video features for audio-visual speech recognition

@inproceedings{Heckmann2002DCTbasedVF,
  title={DCT-based video features for audio-visual speech recognition},
  author={Martin Heckmann and Kristian Kroschel and Christophe Savariaux and Fr{\'e}d{\'e}ric Berthommier},
  booktitle={INTERSPEECH},
  year={2002}
}
Encouraged by the good performance of the DCT in audiovisual speech recognition [1], we investigate how the selection of the DCT coefficients influences the recognition scores in a hybridANN/HMM audio-visual speech recognition system on a continuous word recognition task with a vocabulary of 30 numbers. Three sets of coefficients, based on the mean energy, the variance and the variance relative to the mean value, were chosen. The performance of these coefficients is evaluated in a video only… CONTINUE READING

Figures, Tables, and Topics from this paper.

Citations

Publications citing this paper.
SHOWING 1-10 OF 43 CITATIONS

Visual speech features representation for automatic lip-reading

VIEW 7 EXCERPTS
CITES BACKGROUND, RESULTS & METHODS
HIGHLY INFLUENCED

A PCA Based Visual DCT Feature Extraction Method for Lip-Reading

VIEW 7 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Appearance Feature Extraction versus Image Transform-Based Approach for Visual Speech Recognition

VIEW 6 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Hyper column model vs. fast DCT for feature extraction in visual Arabic speech recognition

VIEW 8 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Computer lipreading via hybrid deep neural network hidden Markov models

VIEW 3 EXCERPTS
CITES BACKGROUND & METHODS

The Impact of Reduced Video Quality on Visual Speech Recognition

VIEW 1 EXCERPT
CITES BACKGROUND

Biosignal-Based Spoken Communication: A Survey

VIEW 1 EXCERPT
CITES METHODS

Decoding visemes: improving machine lipreading (PhD thesis)

VIEW 1 EXCERPT
CITES BACKGROUND

References

Publications referenced by this paper.
SHOWING 1-8 OF 8 REFERENCES

Optimal weighting of posteriors for audio-visual speech recognition

An image transform approach for HMM based automatic lipreading

On the integration of auditory d visual parameters in an hmm - based asr , ” in

  • H. J. M. Steeneken P. Varga, M. Tomlinson, D. Jones
  • Speachread - g by Man and Machine : Models , Systems and Applications
  • 1992