Convolutional Neural Networks for Speech Recognition

@article{AbdelHamid2014ConvolutionalNN,
  title={Convolutional Neural Networks for Speech Recognition},
  author={Ossama Abdel-Hamid and Abdel-rahman Mohamed and Hui Jiang and Li Deng and Gerald Penn and Dong Yu},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year={2014},
  volume={22},
  pages={1533-1545}
}
Recently, the hybrid deep neural network (DNN)-hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional Gaussian mixture model (GMM)-HMM. [...] Key Method We first present a concise description of the basic CNN and explain how it can be used for speech recognition. We further propose a limited-weight-sharing scheme that can better model speech features.Expand
Convolutional Neural Network and Feature Transformation for Distant Speech Recognition
TLDR
It is argued that transforming features could produce more discriminative features for CNN, and hence improve the robustness of speech recognition against reverberation. Expand
Advanced Convolutional Neural Network-Based Hybrid Acoustic Models for Low-Resource Speech Recognition
TLDR
The results of contributions to combine CNN and conventional RNN with gate, highway, and residual networks to reduce the above problems are presented and the optimal neural network structures and training strategies for the proposed neural network models are explored. Expand
Performance Evaluation of Deep Convolutional Maxout Neural Network in Speech Recognition
  • Arash Dehghani, S. Seyyedsalehi
  • Computer Science, Engineering
  • 2018 25th National and 3rd International Iranian Conference on Biomedical Engineering (ICBME)
  • 2018
TLDR
The results obtained from the experiments show that the combined model (CMDNN) improves the performance of ANNs in speech recognition versus the pre-trained fully connected fully connected NNs with sigmoid neurons by about 3%. Expand
Automatic Speech Recognition Using Deep Neural Networks: New Possibilities
TLDR
This dissertation proposes to use the CNN in a way that applies convolution and pooling operations along frequency to handle frequency variations that commonly happen due to speaker and pronunciation differences in speech signals. Expand
Noise robust speech recognition using recent developments in neural networks for computer vision
TLDR
This paper considers two approaches recently developed for image classification and examines their impacts on noisy speech recognition performance, including the use of a Parametric Rectified Linear Unit (PReLU). Expand
Adaptive windows multiple deep residual networks for speech recognition
Abstract The hybrid convolutional neural network and hidden Markov model (CNN-HMM) has recently achieved considerable performance in speech recognition because deep neural networks, model complexExpand
Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition
TLDR
The proposed very deep CNNs can significantly reduce word error rate (WER) for noise robust speech recognition and are competitive with the long short-term memory recurrent neural networks (LSTM-RNN) acoustic model. Expand
An analysis of convolutional neural networks for speech recognition
  • J. Huang, Jinyu Li, Y. Gong
  • Computer Science
  • 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
TLDR
By visualizing the localized filters learned in the convolutional layer, it is shown that edge detectors in varying directions can be automatically learned and it is established that the CNN structure combined with maxout units is the most effective model under small-sizing constraints for the purpose of deploying small-footprint models to devices. Expand
Deep Neural Network Based Speech Recognition Systems Under Noise Perturbations
TLDR
This work investigates the capability of noise immunity in various neural network models through the speech recognition task and demonstrates that the phoneme error rate (PER) degrades as the signal-to-noise ratio (SNR) reduces across all evaluated neural network Models. Expand
Deep Residual Networks with Auditory Inspired Features for Robust Speech Recognition
The introduction of Deep Neural Networks (DNN) based acoustic models has become the new state of the art of speech recognition systems. The main reason for this is their lower recognition error ratesExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 53 REFERENCES
Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition
TLDR
The proposed CNN architecture is applied to speech recognition within the framework of hybrid NN-HMM model to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance. Expand
Exploring convolutional neural network structures and optimization techniques for speech recognition
TLDR
This paper investigates several CNN architectures, including full and limited weight sharing, convolution along frequency and time axes, and stacking of several convolution layers, and develops a novel weighted softmax pooling layer so that the size in the pooled layer can be automatically learned. Expand
Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks.
TLDR
This paper argues that the improved accuracy achieved by the DNNs is the result of their ability to extract discriminative internal representations that are robust to the many sources of variability in speech signals, and shows that these representations become increasingly insensitive to small perturbations in the input with increasing network depth. Expand
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
TLDR
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition. Expand
Deep convolutional neural networks for LVCSR
TLDR
This paper determines the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks, and explores the behavior of neural network features extracted from CNNs on a variety of LVCSS tasks, comparing CNNs toDNNs and GMMs. Expand
Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM
TLDR
This paper presents the strategy of using mixed-bandwidth training data to improve wideband speech recognition accuracy in the CD-DNN-HMM framework, and shows that DNNs provide the flexibility of using arbitrary features. Expand
A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion
We develop and present a novel deep convolutional neural network architecture, where heterogeneous pooling is used to provide constrained frequency-shift invariance in the speech spectrogram whileExpand
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
TLDR
A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs. Expand
Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition
TLDR
This paper has proposed two novel incoherent training methods to explicitly de-correlate BN features in learning of DNN and consistently surpassed the state-of-the-art DNN/HMMs in all evaluated tasks. Expand
Deep Belief Networks using discriminative features for phone recognition
TLDR
Deep Belief Networks work even better when their inputs are speaker adaptive, discriminative features, and on the standard TIMIT corpus, they give phone error rates of 19.6% using monophone HMMs and a bigram language model. Expand
...
1
2
3
4
5
...