Fast, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces

@inproceedings{Zhang2020FastSA,
  title={Fast, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces},
  author={F. Zhang and Yongqiang Wang and Xiaohui Zhang and Chunxi Liu and Yatharth Saraf and G. Zweig},
  booktitle={INTERSPEECH},
  year={2020}
}
  • F. Zhang, Yongqiang Wang, +3 authors G. Zweig
  • Published in INTERSPEECH 2020
  • Engineering, Computer Science
In this work, we first show that on the widely used LibriSpeech benchmark, our transformer-based context-dependent connectionist temporal classification (CTC) system produces state-of-the-art results. We then show that using wordpieces as modeling units combined with CTC training, we can greatly simplify the engineering pipeline compared to conventional frame-based cross-entropy training by excluding all the GMM bootstrapping, decision tree building and force alignment steps, while still… Expand
9 Citations
Feature Replacement and Combination for Hybrid ASR Systems
  • Highly Influenced
  • PDF
Benchmarking LF-MMI, CTC And RNN-T Criteria For Streaming ASR
  • 3
  • PDF
Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network
  • PDF
Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios
  • PDF
Improving RNN Transducer Based ASR with Auxiliary Tasks
  • 7
  • PDF
Streaming Attention-Based Models with Augmented Memory for End-To-End Speech Recognition
  • Ching-feng Yeh, Yongqiang Wang, +4 authors M. Seltzer
  • Computer Science
  • 2021 IEEE Spoken Language Technology Workshop (SLT)
  • 2021
  • 3
  • PDF
Towards Consistent Hybrid HMM Acoustic Modeling
  • PDF
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition
  • 8
  • PDF

References

SHOWING 1-10 OF 50 REFERENCES
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation
  • 132
  • Highly Influential
  • PDF
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
  • 742
  • Highly Influential
  • PDF
Highway long short-term memory RNNS for distant speech recognition
  • 237
  • Highly Influential
  • PDF
The Kaldi Speech Recognition Toolkit
  • 4,479
  • Highly Influential
  • PDF
From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition
  • 37
  • PDF
Transformer-Based Acoustic Modeling for Hybrid Speech Recognition
  • Yongqiang Wang, Abdelrahman Mohamed, +10 authors M. Seltzer
  • Computer Science, Engineering
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
  • 68
  • PDF
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency
  • T. Sainath, Yanzhang He, +26 authors D. Zhao
  • Computer Science
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
  • 65
  • PDF
DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks
  • Andros Tjandra, Chunxi Liu, +5 authors G. Zweig
  • Computer Science
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
  • 12
  • PDF
Specaugment on Large Scale Datasets
  • D. Park, Y. Zhang, +5 authors Yonghui Wu
  • Computer Science, Engineering
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
  • 28
  • PDF
Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
  • Qian Zhang, Han Lu, +4 authors Shankar Kumar
  • Computer Science, Engineering
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
  • 80
  • PDF
...
1
2
3
4
5
...