Speech recognition with deep recurrent neural networks

@article{Graves2013SpeechRW,
  title={Speech recognition with deep recurrent neural networks},
  author={Alex Graves and Abdel-rahman Mohamed and Geoffrey E. Hinton},
  journal={2013 IEEE International Conference on Acoustics, Speech and Signal Processing},
  year={2013},
  pages={6645-6649}
}
Recurrent neural networks (RNNs) are a powerful model for sequential data. [] Key Result When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.

Figures from this paper

Recurrent Neural Networks for End-to-End Speech Recognition: A Comparative Analysis
TLDR
A comparative analysis of RNNs with End-to-End Speech Recognition using different RNN architectures such as Simple RNN cells(SRNN), Long Short Term Memory(LSTMs), Gated Recurrent Unit(GRUs) and even a bidirectional Rnns using all these is compared on Librispeech corpse.
Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition
TLDR
Novel LSTM based RNN architectures which make more effective use of model parameters to train acoustic models for large vocabulary speech recognition are presented.
Deep long short-term memory networks for speech recognition
TLDR
The experiments on 3rd CHiME challenge and Aurora-4 show that the stacks of hybrid model with FNN post-processor outperform stand-alone FNN and LSTM and the other hybrid models for robust speech recognition.
Deep Belief Neural Networks and Bidirectional Long-Short Term Memory Hybrid for Speech Recognition
TLDR
Results show that the use of the new DBNN-BLSTM hybrid as the acoustic model for the Large Vocabulary Continuous Speech Recognition (LVCSR) increases word recognition accuracy and has many parameters and in some cases it may suffer performance issues in real-time applications.
Recurrent deep neural networks for robust speech recognition
TLDR
Full recurrent connections are added to certain hidden layer of a conventional feedforward DNN and allow the model to capture the temporal dependency in deep representations to achieve state-of-the-art performance without front-end preprocessing, speaker adaptive training or multiple decoding passes.
Recent Trends in Application of Neural Networks to Speech Recognition
TLDR
This paper compares the train and test characteristic error rates of DNN, Recurrent Dynamic Neural Networks (RDNN), and Bi-Directional Deep Neural Network (BRDNN) models while roughly controlling for the total number of free parameters in the model.
End-to-End Online Speech Recognition with Recurrent Neural Networks
TLDR
An efficient GPUbased RNN training framework for the truncated backpropagation through time (BPTT) algorithm, which is suitable for online (continuous) training, and an online version of the connectionist temporal classification (CTC) loss computation algorithm, where the original CTC loss is estimated with partial sliding window.
Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU
TLDR
Evaluated RNN, LSTM, and GRU networks are evaluated to compare their performances on a reduced TED-LIUM speech data set and the results show that L STM achieves the best word error rates, however, the GRU optimization is faster while achieving worderror rates close to LSTm.
Recurrent Neural Networks for Speech Recognition
TLDR
Recurrent neural networks for acoustic modeling which are unfolded in time for a fixed number of time steps with the property that the unfolded layers which correspond to the recurrent layer have time-shifted inputs and tied weight matrices are introduced.
Recurrent support vector machines for speech recognition
TLDR
This paper illustrates small but consistent advantages of replacing the softmax layer in RNN with Support Vector Machines (SVMs), which are jointly learned using a sequence-level max-margin criteria, instead of cross-entropy.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
Revisiting Recurrent Neural Networks for robust ASR
TLDR
The Recurrent Neural Network is revisited, which explicitly models the Markovian dynamics of a set of observations through a non-linear function with a much larger hidden state space than traditional sequence models such as an HMM.
Sequence Transduction with Recurrent Neural Networks
TLDR
This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence.
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
TLDR
This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.
Deep Neural Networks for Acoustic Modeling in Speech Recognition
TLDR
This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Bidirectional recurrent neural networks
TLDR
It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.
Recurrent Neural Networks for Noise Reduction in Robust ASR
TLDR
This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.
Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition
TLDR
The proposed CNN architecture is applied to speech recognition within the framework of hybrid NN-HMM model to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance.
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
TLDR
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Tandem Connectionist Feature Extraction for Conversational Speech Recognition
TLDR
The paper shows that MLP transformations yield variables that have regular distributions, which can be further modified by using logarithm to make the distribution easier to model by a Gaussian-HMM.
Connectionist Speech Recognition: A Hybrid Approach
From the Publisher: Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous
...
1
2
3
4
...