Streaming End-to-end Speech Recognition for Mobile Devices

@article{He2019StreamingES,
  title={Streaming End-to-end Speech Recognition for Mobile Devices},
  author={Yanzhang He and T. Sainath and Rohit Prabhavalkar and Ian McGraw and R. Alvarez and Ding Zhao and David Rybach and A. Kannan and Y. Wu and R. Pang and Qiao Liang and Deepti Bhatia and Yuan Shangguan and Bo Li and G. Pundak and K. Sim and Tom Bagby and Shuo-Yiin Chang and K. Rao and A. Gruenstein},
  journal={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2019},
  pages={6381-6385}
}
  • Yanzhang He, T. Sainath, +17 authors A. Gruenstein
  • Published 2019
  • Computer Science
  • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition. [...] Key Result In experimental evaluations, we find that the proposed approach can outperform a conventional CTC-based model in terms of both latency and accuracy in a number of evaluation categories.Expand Abstract
    183 Citations

    Figures, Tables, and Topics from this paper

    A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency
    • T. Sainath, Yanzhang He, +26 authors D. Zhao
    • Computer Science
    • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2020
    • 51
    • PDF
    Two-Pass End-to-End Speech Recognition
    • 40
    • PDF
    Towards Fast and Accurate Streaming End-To-End ASR
    • Bo Li, Shuo-Yiin Chang, +4 authors Yonghui Wu
    • Computer Science, Engineering
    • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2020
    • 29
    • PDF
    Recognizing Long-Form Speech Using Streaming End-to-End Models
    • 25
    • PDF
    Streaming end-to-end multi-talker speech recognition
    • PDF
    Using Speech Synthesis to Train End-To-End Spoken Language Understanding Models
    • 8
    • PDF
    VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition
    • 2
    • PDF
    A Comparison of End-to-End Models for Long-Form Speech Recognition
    • Chung-Cheng Chiu, Wei Han, +11 authors Y. Wu
    • Computer Science, Engineering
    • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
    • 2019
    • 25
    • PDF
    Multistate Encoding with End-To-End Speech RNN Transducer Network
    A review of on-device fully neural end-to-end automatic speech recognition algorithms
    • Highly Influenced
    • PDF

    References

    SHOWING 1-10 OF 48 REFERENCES
    EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding
    • 542
    • PDF
    Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer
    • 164
    • PDF
    Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search
    • 23
    • PDF
    Deep Speech: Scaling up end-to-end speech recognition
    • 1,184
    • PDF
    Semi-supervised Training for End-to-end Models via Weak Distillation
    • 22
    Personalized speech recognition on mobile devices
    • 113
    • PDF
    End-to-end attention-based large vocabulary speech recognition
    • 749
    • PDF
    Joint CTC-attention based end-to-end speech recognition using multi-task learning
    • 371
    • PDF
    Bringing contextual information to google speech recognition
    • 43
    • PDF
    State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
    • 616
    • PDF