• Corpus ID: 6263878

Long short-term memory recurrent neural network architectures for large scale acoustic modeling

@inproceedings{Sak2014LongSM,
  title={Long short-term memory recurrent neural network architectures for large scale acoustic modeling},
  author={Hasim Sak and Andrew W. Senior and Françoise Beaufays},
  booktitle={INTERSPEECH},
  year={2014}
}
Long Short-Term Memory (LSTM) is a specific recurrent neural network (RNN) architecture that was designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. [] Key Method Here, we introduce the first distributed training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machines.

Figures and Tables from this paper

Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition
TLDR
This work proposes a compact feedforward sequential memory networks (cFSMN) by combining FSMN with low-rank matrix factorization and makes a slight modification to the encoding method used in FSMNs in order to further simplify the network architecture.
Long short-term memory recurrent neural network architectures for Urdu acoustic modeling
TLDR
LSTM architectures were compared with gated recurrent unit (GRU) based architectures and it was found that LSTM has an advantage over GRU.
Deep long short-term memory networks for speech recognition
TLDR
The experiments on 3rd CHiME challenge and Aurora-4 show that the stacks of hybrid model with FNN post-processor outperform stand-alone FNN and LSTM and the other hybrid models for robust speech recognition.
On speaker adaptation of long short-term memory recurrent neural networks
TLDR
This paper observes that LSTMRNNs can be effectively adapted by using speaker-adaptive (SA) front-end, or by inserting speaker-dependent (SD) layers, and proposes two adaptation approaches that implement the SD-layer-insertion idea specifically for LSTM-RNNs.
Deep LSTM for Large Vocabulary Continuous Speech Recognition
TLDR
This work introduces a training framework with layer-wise training and exponential moving average methods for deeper LSTM models, and introduces the novel transfer learning strategy with segmental Minimum Bayes-Risk, which makes it possible that training with only a small part of dataset could outperform full dataset training from the beginning.
Maxout neurons based deep bidirectional LSTM for acoustic modeling
TLDR
In this new acoustic model, maxout neurons are used in the fully-connected part of DBLSTM to solve the problems of vanishing and exploding gradient and a context-sensitive-chunk (CSC) back-propagation through time (BPTT) algorithm is proposed to train DBLstM neural network.
Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition
  • Xiangang Li, Xihong Wu
  • Computer Science
    2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
TLDR
Alternative deep LSTM architectures are proposed and empirically evaluated on a large vocabulary conversational telephone speech recognition task and Experimental results demonstrate that the deep L STM networks benefit from the depth and yield the state-of-the-art performance on this task.
RECURRENT NEURAL NETWORKS FOR ACOUSTIC MODELLING
TLDR
This paper addresses the vanishing gradient problem using a high order RNN (HORNN) which has additional connections from multiple previous time steps and shows that the proposed HORNN architectures for rectified linear unit and sigmoid activation functions reduced word error rates (WER) by 4.2% and 6.3% respectively.
High Order Recurrent Neural Networks for Acoustic Modelling
  • C. Zhang, P. Woodland
  • Computer Science
    2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
TLDR
This paper addresses the vanishing gradient problem using a high order RNN (HORNN) which has additional connections from multiple previous time steps and speech recognition experiments showed that the proposed HORNN architectures for rectified linear unit and sigmoid activation functions reduced word error rates (WER) by 4.2% and 6.3% respectively.
Sequence discriminative distributed training of long short-term memory recurrent neural networks
TLDR
This paper compares two sequence discriminative criteria – maximum mutual information and state-level minimum Bayes risk, and investigates a number of variations of the basic training strategy to better understand issues raised by both the sequential model, and the objective function.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 26 REFERENCES
Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition
TLDR
Novel LSTM based RNN architectures which make more effective use of model parameters to train acoustic models for large vocabulary speech recognition are presented.
Speech recognition with deep recurrent neural networks
TLDR
This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Hybrid speech recognition with Deep Bidirectional LSTM
TLDR
The hybrid approach with DBLSTM appears to be well suited for tasks where acoustic modelling predominates, and the improvement in word error rate over the deep network is modest, despite a great increase in framelevel accuracy.
LSTM Neural Networks for Language Modeling
TLDR
This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
TLDR
A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
Long Short-Term Memory
TLDR
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Recurrent neural network based language model
TLDR
Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.
Training and Analysing Deep Recurrent Neural Networks
TLDR
This work studies the effect of a hierarchy of recurrent neural networks on processing time series, and shows that they reach state-of-the-art performance for recurrent networks in character-level language modelling when trained with stochastic gradient descent.
Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition
TLDR
This paper reports results of a DBN-pretrained context-dependent ANN/HMM system trained on two datasets that are much larger than any reported previously, and outperforms the best Gaussian Mixture Model Hidden Markov Model baseline.
...
1
2
3
...