# Speech recognition with deep recurrent neural networks

@article{Graves2013SpeechRW, title={Speech recognition with deep recurrent neural networks}, author={Alex Graves and Abdel-rahman Mohamed and Geoffrey E. Hinton}, journal={2013 IEEE International Conference on Acoustics, Speech and Signal Processing}, year={2013}, pages={6645-6649} }

Recurrent neural networks (RNNs) are a powerful model for sequential data. [... ] Key Result When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score. Expand

## 6,992 Citations

Recurrent Neural Networks for End-to-End Speech Recognition: A Comparative Analysis

- Computer Science
- 2018

A comparative analysis of RNNs with End-to-End Speech Recognition using different RNN architectures such as Simple RNN cells(SRNN), Long Short Term Memory(LSTMs), Gated Recurrent Unit(GRUs) and even a bidirectional Rnns using all these is compared on Librispeech corpse.

Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition

- Computer ScienceArXiv
- 2014

Novel LSTM based RNN architectures which make more effective use of model parameters to train acoustic models for large vocabulary speech recognition are presented.

Deep long short-term memory networks for speech recognition

- Computer Science2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)
- 2016

The experiments on 3rd CHiME challenge and Aurora-4 show that the stacks of hybrid model with FNN post-processor outperform stand-alone FNN and LSTM and the other hybrid models for robust speech recognition.

Deep Belief Neural Networks and Bidirectional Long-Short Term Memory Hybrid for Speech Recognition

- Computer Science
- 2015

Results show that the use of the new DBNN-BLSTM hybrid as the acoustic model for the Large Vocabulary Continuous Speech Recognition (LVCSR) increases word recognition accuracy and has many parameters and in some cases it may suffer performance issues in real-time applications.

Recurrent deep neural networks for robust speech recognition

- Computer Science2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2014

Full recurrent connections are added to certain hidden layer of a conventional feedforward DNN and allow the model to capture the temporal dependency in deep representations to achieve state-of-the-art performance without front-end preprocessing, speaker adaptive training or multiple decoding passes.

Recent Trends in Application of Neural Networks to Speech Recognition

- Computer Science
- 2016

This paper compares the train and test characteristic error rates of DNN, Recurrent Dynamic Neural Networks (RDNN), and Bi-Directional Deep Neural Network (BRDNN) models while roughly controlling for the total number of free parameters in the model.

End-to-End Online Speech Recognition with Recurrent Neural Networks

- Computer Science
- 2017

An efficient GPUbased RNN training framework for the truncated backpropagation through time (BPTT) algorithm, which is suitable for online (continuous) training, and an online version of the connectionist temporal classification (CTC) loss computation algorithm, where the original CTC loss is estimated with partial sliding window.

Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU

- Computer ScienceJ. Artif. Intell. Soft Comput. Res.
- 2019

Evaluated RNN, LSTM, and GRU networks are evaluated to compare their performances on a reduced TED-LIUM speech data set and the results show that L STM achieves the best word error rates, however, the GRU optimization is faster while achieving worderror rates close to LSTm.

Recurrent Neural Networks for Speech Recognition

- Computer Science
- 2014

Recurrent neural networks for acoustic modeling which are unfolded in time for a fixed number of time steps with the property that the unfolded layers which correspond to the recurrent layer have time-shifted inputs and tied weight matrices are introduced.

Recurrent support vector machines for speech recognition

- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016

This paper illustrates small but consistent advantages of replacing the softmax layer in RNN with Support Vector Machines (SVMs), which are jointly learned using a sequence-level max-margin criteria, instead of cross-entropy.

## References

SHOWING 1-10 OF 35 REFERENCES

Revisiting Recurrent Neural Networks for robust ASR

- Computer Science2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2012

The Recurrent Neural Network is revisited, which explicitly models the Markovian dynamics of a set of observations through a non-linear function with a much larger hidden state space than traditional sequence models such as an HMM.

Sequence Transduction with Recurrent Neural Networks

- Computer Science, BiologyArXiv
- 2012

This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence.

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

- Computer ScienceICML
- 2006

This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.

Deep Neural Networks for Acoustic Modeling in Speech Recognition

- Computer Science
- 2012

This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.

Bidirectional recurrent neural networks

- Computer ScienceIEEE Trans. Signal Process.
- 1997

It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.

Recurrent Neural Networks for Noise Reduction in Robust ASR

- Computer ScienceINTERSPEECH
- 2012

This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.

Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition

- Computer Science2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2012

The proposed CNN architecture is applied to speech recognition within the framework of hybrid NN-HMM model to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance.

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

- Computer ScienceIEEE Signal Processing Magazine
- 2012

This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

Tandem Connectionist Feature Extraction for Conversational Speech Recognition

- Computer ScienceMLMI
- 2004

The paper shows that MLP transformations yield variables that have regular distributions, which can be further modified by using logarithm to make the distribution easier to model by a Gaussian-HMM.

Connectionist Speech Recognition: A Hybrid Approach

- Computer Science
- 1993

From the Publisher:
Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous…