Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

@article{Hinton2012DeepNN,
  title={Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups},
  author={Geoffrey E. Hinton and Li Deng and Dong Yu and George E. Dahl and Abdel-rahman Mohamed and Navdeep Jaitly and Andrew W. Senior and Vincent Vanhoucke and Patrick Nguyen and Tara N. Sainath and Brian Kingsbury},
  journal={IEEE Signal Processing Magazine},
  year={2012},
  volume={29},
  pages={82-97}
}
Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural… Expand
Noise Robust Speech Recognition Using Deep Belief Networks
TLDR
Deep Belief Networks (DBNs) are used to extract discriminative information from larger window of frames in GMMs and indicate that this new method of feature encoding result in much better word recognition accuracy. Expand
Acoustic Modeling of Speech Signal using Artificial Neural Network: A Review of Techniques and Current Trends
TLDR
In this chapter various techniques and works on the Artificial Neural Network (ANN) based acoustic modeling are described. Expand
Deep neural networks with auxiliary Gaussian mixture models for real-time speech recognition
TLDR
Experiments on a large vocabulary speech recognition task show that both approaches improve recognition performance consistently and that the gains are mostly additive, resulting in about 5% relative improvement over the competitive DNN baseline in both Portuguese and English systems. Expand
Deep segmental neural networks for speech recognition
TLDR
The deep segmental neural network (DSNN) is proposed, a segmental model that uses DNNs to estimate the acoustic scores of phonemic or sub-phonemic segments with variable lengths, which allows the DSNN to represent each segment as a single unit, in which frames are made dependent on each other. Expand
Convolutional Neural Networks for Speech Recognition
TLDR
It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features. Expand
Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends
TLDR
This article systematically reviews emerging speech generation approaches with the dual goal of helping readers gain a better understanding of the existing techniques as well as stimulating new work in the burgeoning area of deep learning for parametric speech generation. Expand
Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis
TLDR
It is shown that the hidden representation used within a DNN can be improved through the use of Multi-Task Learning, and that stacking multiple frames of hidden layer activations (stacked bottleneck features) also leads to improvements. Expand
An Analysis of Deep Neural Networks in Broad Phonetic Classes for Noisy Speech Recognition
TLDR
The experiments demonstrate that performance is still tightly related to the particular phonetic class being stops and affricates the least resilient but also that relative improvements of both DNN variants are distributed unevenly across those classes having the type of noise a significant influence on the distribution. Expand
Acoustic modeling in Automatic Speech Recognition - A Survey
  • A. Waris, R. Aggarwal
  • Computer Science
  • 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA)
  • 2018
TLDR
An overview of Hidden Markov Model, Deep Neural Networks, DNNs and Convolutional Neural Network based models, which are the backbone of ASR systems are represented. Expand
A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition
TLDR
This paper studies the RNN encoder-decoder approach for large vocabulary end-to-end speech recognition, whereby an encoder transforms a sequence of acoustic vectors into a sequences of feature representations, from which a decoder recovers asequence of words. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 74 REFERENCES
Deep Belief Networks using discriminative features for phone recognition
TLDR
Deep Belief Networks work even better when their inputs are speaker adaptive, discriminative features, and on the standard TIMIT corpus, they give phone error rates of 19.6% using monophone HMMs and a bigram language model. Expand
Acoustic Modeling Using Deep Belief Networks
TLDR
It is shown that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain many layers of features and a very large number of parameters. Expand
Deep Belief Networks for phone recognition
TLDR
Deep Belief Networks (DBNs) have recently proved to be very effective in a variety of machine learning problems and this paper applies DBNs to acous ti modeling. Expand
Speech Recognition Using Augmented Conditional Random Fields
TLDR
A new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed, which addresses some limitations of HMMs while maintaining many of the aspects which have made them successful. Expand
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
TLDR
A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs. Expand
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
  • Brian Kingsbury
  • Computer Science
  • 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2009
TLDR
This paper demonstrates that neural-network acoustic models can be trained with sequence classification criteria using exactly the same lattice-based methods that have been developed for Gaussian mixture HMMs, and that using a sequence classification criterion in training leads to considerably better performance. Expand
Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition
TLDR
The proposed CNN architecture is applied to speech recognition within the framework of hybrid NN-HMM model to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance. Expand
Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition
TLDR
This paper reports results of a DBN-pretrained context-dependent ANN/HMM system trained on two datasets that are much larger than any reported previously, and outperforms the best Gaussian Mixture Model Hidden Markov Model baseline. Expand
Tandem connectionist feature extraction for conventional HMM systems
TLDR
A large improvement in word recognition performance is shown by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling. Expand
Large scale discriminative training of hidden Markov models for speech recognition
TLDR
It is shown that HMMs trained with MMIE benefit as much as MLE-trained HMMs from applying model adaptation using maximum likelihood linear regression (MLLR), which has allowed the straightforward integration of MMIe- trained HMMs into complex multi-pass systems for transcription of conversational telephone speech. Expand
...
1
2
3
4
5
...