RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

@article{Chiu2021RNNTMF,
  title={RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions},
  author={C. Chiu and A. Narayanan and Wei Han and Rohit Prabhavalkar and Y. Zhang and Navdeep Jaitly and R. Pang and T. Sainath and P. Nguyen and L. Cao and Yonghui Wu},
  journal={2021 IEEE Spoken Language Technology Workshop (SLT)},
  year={2021},
  pages={873-880}
}
In recent years, all-neural end-to-end approaches have obtained state-of-the-art results on several challenging automatic speech recognition (ASR) tasks. However, most existing works focus on building ASR models where train and test data are drawn from the same domain. This results in poor generalization characteristics on mismatched-domains: e.g., end-to-end models trained on short segments perform poorly when evaluated on longer utterances. In this work, we analyze the generalization… Expand
6 Citations

Figures and Tables from this paper

Improving RNN-T ASR Accuracy Using Untranscribed Context Audio
  • PDF
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
  • 33
  • PDF
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer
  • 4
  • Highly Influenced
  • PDF
A New Training Pipeline for an Improved Neural Transducer
  • 8
  • PDF
Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data
  • PDF

References

SHOWING 1-10 OF 33 REFERENCES
Recognizing Long-Form Speech Using Streaming End-to-End Models
  • 25
  • PDF
A Comparison of End-to-End Models for Long-Form Speech Recognition
  • C. Chiu, Wei Han, +11 authors Y. Wu
  • Computer Science, Engineering
  • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
  • 2019
  • 25
  • PDF
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency
  • T. Sainath, Yanzhang He, +26 authors D. Zhao
  • Computer Science
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
  • 58
  • PDF
Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer
  • 172
  • PDF
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
  • C. Chiu, T. Sainath, +11 authors M. Bacchiani
  • Computer Science, Engineering
  • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
  • 642
  • PDF
Toward Domain-Invariant Speech Recognition via Large Scale Training
  • 34
  • PDF
Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
  • 226
  • PDF
Deep Speech: Scaling up end-to-end speech recognition
  • 1,209
  • PDF
Specaugment on Large Scale Datasets
  • D. Park, Y. Zhang, +5 authors Yonghui Wu
  • Computer Science, Engineering
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
  • 27
  • PDF
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
  • 1,734
  • PDF
...
1
2
3
4
...