An Investigation of End-to-End Models for Robust Speech Recognition
@article{Prasad2021AnIO, title={An Investigation of End-to-End Models for Robust Speech Recognition}, author={Archiki Prasad and Preethi Jyothi and Rajbabu Velmurugan}, journal={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, year={2021}, pages={6893-6897} }
End-to-end models for robust automatic speech recognition (ASR) have not been sufficiently well-explored in prior work. With end-to-end models, one could choose to preprocess the input speech using speech enhancement techniques and train the model using enhanced speech. Another alternative is to pass the noisy speech as input and modify the model architecture to adapt to noisy speech. A systematic comparison of these two approaches for end-to-end robust ASR has not been attempted before. We…
3 Citations
Recent Advances in End-to-End Automatic Speech Recognition
- Computer ScienceArXiv
- 2021
This paper overviews the recent advances in E2E models, focusing on technologies addressing those challenges from the industry’s perspective.
A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition
- Computer ScienceICASSP
- 2022
Experimental results reveal that the proposed enhanced wav2vec2.0 model can not only improve the ASR performance on the noisy test set which surpasses the originals, but also ensure a tiny performance decrease on the clean test set.
Multiple Confidence Gates For Joint Training Of SE And ASR
- Engineering
- 2022
Joint training of speech enhancement model (SE) and speech recognition model (ASR) is a common solution for robust ASR in noisy environments. SE focuses on improving the auditory quality of speech,…
References
SHOWING 1-10 OF 21 REFERENCES
Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition
- Computer ScienceINTERSPEECH
- 2019
This paper proposes a jointly adversarial enhancement training to boost robustness of end-to-end systems and achieves the relative error rate reduction of 4.6% over the multi-condition training.
Learning Noise Invariant Features Through Transfer Learning For Robust End-to-End Speech Recognition
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
This work argues that the clean classifier can force the feature extractor to learn the underlying noise invariant patterns in the noisy dataset, and proposes transfer learning from a clean dataset (WSJ) to a noisy dataset (CHiME4) for connectionist temporal classification models.
SEGAN: Speech Enhancement Generative Adversarial Network
- Computer ScienceINTERSPEECH
- 2017
This work proposes the use of generative adversarial networks for speech enhancement, and operates at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.
Deep Xi as a Front-End for Robust Automatic Speech Recognition
- Computer Science2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)
- 2020
The experimental investigation of Deep Xi as a frontend for robust ASR shows that Deep Xi is a viable front-end, and is able to significantly increase the robustness of an ASR system.
Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation
- Computer Science2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- 2017
This paper addresses the unsupervised domain adaptation problem for robust speech recognition, where both source and target domain speech are available, but word transcripts are only available for the source domain speech.
How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems
- Computer ScienceACL
- 2020
This work uses a state-of-the-art end-to-end ASR system that is trained on a large amount of US-accented English speech, and examines the effects of accent on the internal representation using three main probing techniques.
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
- Computer ScienceICML
- 2016
It is shown that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech-two vastly different languages, and is competitive with the transcription of human workers when benchmarked on standard datasets.
An investigation of deep neural networks for noise robust speech recognition
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
The noise robustness of DNN-based acoustic models can match state-of-the-art performance on the Aurora 4 task without any explicit noise compensation and can be further improved by incorporating information about the environment into DNN training using a new method called noise-aware training.
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
- Computer Science
- 2017
Recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems are reviewed.
Joint noise adaptive training for robust automatic speech recognition
- Computer Science2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2014
By formulating separation as a supervised mask estimation problem, a unified DNN framework is developed that jointly improves separation and acoustic modeling and improves performance on the Aurora-4 dataset.