Deep convolutional neural networks for LVCSR

@article{Sainath2013DeepCN,
  title={Deep convolutional neural networks for LVCSR},
  author={Tara N. Sainath and Abdel-rahman Mohamed and Brian Kingsbury and Bhuvana Ramabhadran},
  journal={2013 IEEE International Conference on Acoustics, Speech and Signal Processing},
  year={2013},
  pages={8614-8618}
}
Convolutional Neural Networks (CNNs) are an alternative type of neural network that can be used to reduce spectral variations and model spectral correlations which exist in signals. [...] Key Method Specifically, we focus on how many convolutional layers are needed, what is the optimal number of hidden units, what is the best pooling strategy, and the best input feature type for CNNs. We then explore the behavior of neural network features extracted from CNNs on a variety of LVCSR tasks, comparing CNNs to DNNs…Expand
Deep Convolutional Neural Networks for Large-scale Speech Tasks
TLDR
This paper determines the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks, and investigates how to incorporate speaker-adapted features, which cannot directly be modeled by CNNs as they do not obey locality in frequency, into the CNN framework. Expand
Improvements to Deep Convolutional Neural Networks for LVCSR
TLDR
A deep analysis comparing limited weight sharing and full weight sharing with state-of-the-art features is conducted and an effective strategy to use dropout during Hessian-free sequence training is introduced. Expand
Very deep multilingual convolutional neural networks for LVCSR
TLDR
A very deep convolutional network architecture with up to 14 weight layers, with small 3×3 kernels, inspired by the VGG Imagenet 2014 architecture is introduced and multilingual CNNs with multiple untied layers are introduced. Expand
An analysis of convolutional neural networks for speech recognition
  • J. Huang, Jinyu Li, Y. Gong
  • Computer Science
  • 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
TLDR
By visualizing the localized filters learned in the convolutional layer, it is shown that edge detectors in varying directions can be automatically learned and it is established that the CNN structure combined with maxout units is the most effective model under small-sizing constraints for the purpose of deploying small-footprint models to devices. Expand
Very deep convolutional neural networks for LVCSR
TLDR
Very deep CNNs are introduced for LVCSR task, by extending depth of convolutional layers up to ten, and a better way to perform convolution operations on temporal dimension is proposed. Expand
Phone recognition with hierarchical convolutional deep maxout networks
  • L. Tóth
  • Computer Science
  • EURASIP J. Audio Speech Music. Process.
  • 2015
TLDR
It is shown that with the hierarchical modelling approach, the CNN can reduce the error rate of the network on an expanded context of input, and it is found that all the proposed modelling improvements give consistently better results for this larger database as well. Expand
Convolutional Neural Networks for Speech Recognition
TLDR
It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features. Expand
A Hybrid of Deep CNN and Bidirectional LSTM for Automatic Speech Recognition
TLDR
A hybrid architecture of CNN-BLSTM is proposed to appropriately use spatial and temporal properties of the speech signal and to improve the continuous speech recognition task and overcome another shortcoming of CNN, i.e. speaker-adapted features, which are not possible to be directly modeled in CNN. Expand
Advances in Very Deep Convolutional Neural Networks for LVCSR
TLDR
This paper proposes a new CNN design without timepadding and without timepooling, which is slightly suboptimal for accuracy, but has two significant advantages: it enables sequence training and deployment by allowing efficient convolutional evaluation of full utterances, and, it allows for batch normalization to be straightforwardly adopted to CNNs on sequence data. Expand
Convolutional Neural Network for ASR
  • Sourav Newatia, R. Aggarwal
  • Computer Science
  • 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA)
  • 2018
TLDR
This paper aims to provide a broad survey on different types of pooling techniques applied in CNNs architecture, including key properties of CNNs, architecture ofCNNs, and summarize the improvements of CNN’s. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 17 REFERENCES
Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition
TLDR
The proposed CNN architecture is applied to speech recognition within the framework of hybrid NN-HMM model to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance. Expand
Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition
TLDR
This paper reports results of a DBN-pretrained context-dependent ANN/HMM system trained on two datasets that are much larger than any reported previously, and outperforms the best Gaussian Mixture Model Hidden Markov Model baseline. Expand
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
TLDR
A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs. Expand
Face recognition: a convolutional neural-network approach
TLDR
A hybrid neural-network for human face recognition which compares favourably with other methods and analyzes the computational complexity and discusses how new classes could be added to the trained recognizer. Expand
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
TLDR
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition. Expand
Deep Neural Networks for Acoustic Modeling in Speech Recognition
TLDR
This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition. Expand
Auto-encoder bottleneck features using deep belief networks
TLDR
The experiments indicate that with the AE-BN architecture, pre-trained and deeper NNs produce better AE-NP features, and system combination with the GMM/HMM baseline andAE-BN systems provides an additional 0.5% absolute improvement on a larger Broadcast News task. Expand
Making Deep Belief Networks effective for large vocabulary continuous speech recognition
TLDR
This paper explores the performance of DBNs in a state-of-the-art LVCSR system, showing improvements over Multi-Layer Perceptrons (MLPs) and GMM/HMMs across a variety of features on an English Broadcast News task. Expand
Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization
TLDR
A distributed neural network training algorithm, based on Hessianfree optimization, that scales to deep networks and large data sets and yields relative reductions in word error rate of 7–13% over cross-entropy training with stochastic gradient descent on two larger tasks: Switchboard and DARPA RATS noisy Levantine Arabic. Expand
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
  • Brian Kingsbury
  • Computer Science
  • 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2009
TLDR
This paper demonstrates that neural-network acoustic models can be trained with sequence classification criteria using exactly the same lattice-based methods that have been developed for Gaussian mixture HMMs, and that using a sequence classification criterion in training leads to considerably better performance. Expand
...
1
2
...