Audio visual word spotting

@article{Liu2004AudioVW,
  title={Audio visual word spotting},
  author={Ming Liu and Ziyou Xiong and Stephen M. Chu and ZhenQiu Zhang and Thomas S. Huang},
  journal={2004 IEEE International Conference on Acoustics, Speech, and Signal Processing},
  year={2004},
  volume={3},
  pages={iii-785}
}
The task of word spotting is to detect and verify some specific words embedded in unconstrained speech. Most word spotters based on hidden Markov models (HMMs) have the same noise robustness problem as a speech recognizer. The performance of a word spotter drops significantly under a noisy environment. Visual speech information has been shown to improve noise robustness of speech recognizers (Neti, C. et al., 2000; Nefian, A.V. et al., 2002; Potamianos, G. et al., 2003). We add visual speech… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-8 OF 8 CITATIONS

A Novel Lip Descriptor for Audio-Visual Keyword Spotting Based on Adaptive Decision Fusion

  • IEEE Transactions on Multimedia
  • 2016
VIEW 5 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Audio-visual keyword spotting based on adaptive decision fusion under noisy conditions for human-robot interaction

  • 2014 IEEE International Conference on Robotics and Automation (ICRA)
  • 2014
VIEW 5 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Audio-Visual Keyword Spotting Based on Multidimensional Convolutional Neural Network

  • 2018 25th IEEE International Conference on Image Processing (ICIP)
  • 2018
VIEW 1 EXCERPT

Audio-visual Keyword Spotting for Mandarin Based on Discriminative Local Spatial-Temporal Descriptors

  • 2014 22nd International Conference on Pattern Recognition
  • 2014
VIEW 3 EXCERPTS
CITES BACKGROUND & METHODS

Audio/video fusion for objects recognition

  • 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems
  • 2009
VIEW 1 EXCERPT

An audio-visual fusion framework with joint dimensionality reducton

  • 2008 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2008
VIEW 1 EXCERPT
CITES METHODS

References

Publications referenced by this paper.
SHOWING 1-4 OF 4 REFERENCES

A coupled HMM for audio-visual speech recognition

  • 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 2002
VIEW 1 EXCERPT

Face detection with information-based maximum discrimination

  • Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition
  • 1997
VIEW 1 EXCERPT

Maximum likelihood face detection

  • Proceedings of the Second International Conference on Automatic Face and Gesture Recognition
  • 1996
VIEW 1 EXCERPT