Convolutional Neural Networks for Speech Recognition
@article{AbdelHamid2014ConvolutionalNN, title={Convolutional Neural Networks for Speech Recognition}, author={Ossama Abdel-Hamid and Abdel-rahman Mohamed and Hui Jiang and Li Deng and Gerald Penn and Dong Yu}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, year={2014}, volume={22}, pages={1533-1545} }
Recently, the hybrid deep neural network (DNN)-hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional Gaussian mixture model (GMM)-HMM. [] Key Method We first present a concise description of the basic CNN and explain how it can be used for speech recognition. We further propose a limited-weight-sharing scheme that can better model speech features.
Figures and Tables from this paper
1,691 Citations
Convolutional Neural Network and Feature Transformation for Distant Speech Recognition
- Computer ScienceInternational Journal of Electrical and Computer Engineering (IJECE)
- 2018
It is argued that transforming features could produce more discriminative features for CNN, and hence improve the robustness of speech recognition against reverberation.
Advanced Convolutional Neural Network-Based Hybrid Acoustic Models for Low-Resource Speech Recognition
- Computer ScienceComput.
- 2020
The results of contributions to combine CNN and conventional RNN with gate, highway, and residual networks to reduce the above problems are presented and the optimal neural network structures and training strategies for the proposed neural network models are explored.
Performance Evaluation of Deep Convolutional Maxout Neural Network in Speech Recognition
- Computer Science2018 25th National and 3rd International Iranian Conference on Biomedical Engineering (ICBME)
- 2018
The results obtained from the experiments show that the combined model (CMDNN) improves the performance of ANNs in speech recognition versus the pre-trained fully connected fully connected NNs with sigmoid neurons by about 3%.
Automatic Speech Recognition Using Deep Neural Networks: New Possibilities
- Computer Science
- 2014
This dissertation proposes to use the CNN in a way that applies convolution and pooling operations along frequency to handle frequency variations that commonly happen due to speaker and pronunciation differences in speech signals.
Noise robust speech recognition using recent developments in neural networks for computer vision
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
This paper considers two approaches recently developed for image classification and examines their impacts on noisy speech recognition performance, including the use of a Parametric Rectified Linear Unit (PReLU).
Adaptive windows multiple deep residual networks for speech recognition
- Computer ScienceExpert Syst. Appl.
- 2020
Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2016
The proposed very deep CNNs can significantly reduce word error rate (WER) for noise robust speech recognition and are competitive with the long short-term memory recurrent neural networks (LSTM-RNN) acoustic model.
An analysis of convolutional neural networks for speech recognition
- Computer Science2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2015
By visualizing the localized filters learned in the convolutional layer, it is shown that edge detectors in varying directions can be automatically learned and it is established that the CNN structure combined with maxout units is the most effective model under small-sizing constraints for the purpose of deploying small-footprint models to devices.
Deep Residual Networks with Auditory Inspired Features for Robust Speech Recognition
- Computer Science
- 2017
A Deep Residual Network architecture is proposed, allowing ResNets to be used in speech recognition tasks where the network input is small in comparison with the image dimensions for which they were initially designed, and a modification of the well-known Power Normalized Cepstral Coefficients as input to the ResNet is introduced with the aim of creating a noise invariant representation of the acoustic space.
Recurrent convolutional neural network for speech processing
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
A recently developed deep learning model, recurrent convolutional neural network (RCNN), is proposed to use for speech processing, which inherits some merits of recurrent neural networks (RNN) and convolutionals (CNN) and is competitive with previous methods in terms of accuracy and efficiency.
References
SHOWING 1-10 OF 48 REFERENCES
Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition
- Computer Science2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2012
The proposed CNN architecture is applied to speech recognition within the framework of hybrid NN-HMM model to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance.
Exploring convolutional neural network structures and optimization techniques for speech recognition
- Computer ScienceINTERSPEECH
- 2013
This paper investigates several CNN architectures, including full and limited weight sharing, convolution along frequency and time axes, and stacking of several convolution layers, and develops a novel weighted softmax pooling layer so that the size in the pooled layer can be automatically learned.
Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks.
- Computer ScienceICLR 2013
- 2013
This paper argues that the improved accuracy achieved by the DNNs is the result of their ability to extract discriminative internal representations that are robust to the many sources of variability in speech signals, and shows that these representations become increasingly insensitive to small perturbations in the input with increasing network depth.
Deep convolutional neural networks for LVCSR
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
This paper determines the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks, and explores the behavior of neural network features extracted from CNNs on a variety of LVCSS tasks, comparing CNNs toDNNs and GMMs.
Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM
- Computer Science2012 IEEE Spoken Language Technology Workshop (SLT)
- 2012
This paper presents the strategy of using mixed-bandwidth training data to improve wideband speech recognition accuracy in the CD-DNN-HMM framework, and shows that DNNs provide the flexibility of using arbitrary features.
A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
We develop and present a novel deep convolutional neural network architecture, where heterogeneous pooling is used to provide constrained frequency-shift invariance in the speech spectrogram while…
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2012
A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
This paper has proposed two novel incoherent training methods to explicitly de-correlate BN features in learning of DNN and consistently surpassed the state-of-the-art DNN/HMMs in all evaluated tasks.
Deep Belief Networks using discriminative features for phone recognition
- Computer Science2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2011
Deep Belief Networks work even better when their inputs are speaker adaptive, discriminative features, and on the standard TIMIT corpus, they give phone error rates of 19.6% using monophone HMMs and a bigram language model.
Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling
- Computer Science2012 8th International Symposium on Chinese Spoken Language Processing
- 2012
This paper investigates DNN for several large vocabulary speech recognition tasks and proposes a few ideas to reconfigure the DNN input features, such as using logarithm spectrum features or VTLN normalized features in DNN.