Corpus ID: 150374150

MobiVSR: A Visual Speech Recognition Solution for Mobile Devices

@article{Shrivastava2019MobiVSRAV,
  title={MobiVSR: A Visual Speech Recognition Solution for Mobile Devices},
  author={Nilay Shrivastava and Astitwa Saxena and Y. Kumar and P. Kaur and R. Shah and Debanjan Mahata},
  journal={ArXiv},
  year={2019},
  volume={abs/1905.03968}
}
Visual speech recognition (VSR) is the task of recognizing spoken language from video input only, without any audio. [...] Key Method We use depthwise-separable 3D convolution for the first time in the domain of VSR and show how it makes our model efficient. MobiVSR achieves an accuracy of 73\% on a challenging Lip Reading in the Wild dataset with 6 times fewer parameters and 20 times lesser memory footprint than the current state of the art. MobiVSR can also be compressed to 6 MB by applying post training…Expand

References

SHOWING 1-10 OF 64 REFERENCES
Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices
MyLipper: A Personalized System for Speech Reconstruction using Multi-view Visual Feeds
Deep complementary bottleneck features for visual speech recognition
  • S. Petridis, M. Pantic
  • Computer Science
  • 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2016
Dynamically Hierarchy Revolution: DirNet for Compressing Recurrent Neural Network on Mobile Devices
Combining Residual Networks with LSTMs for Lipreading
Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed
Audio-visual speech recognition using deep learning
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
Quantized Convolutional Neural Networks for Mobile Devices
...
1
2
3
4
5
...