• Corpus ID: 6299001

Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition

@article{Maas2014IncreasingDN,
  title={Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition},
  author={Andrew L. Maas and Awni Y. Hannun and Christopher T. Lengerich and Peng Qi and Dan Jurafsky and A. Ng},
  journal={ArXiv},
  year={2014},
  volume={abs/1406.7806}
}
Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Part of the promise of DNNs is their ability to represent increasingly complex functions as the number of DNN parameters increases. This paper investigates the performance of DNNbased hybrid speech recognition systems as DNN model size and training data increase. Using a distributed GPU architecture, we train DNN acoustic models roughly an order of magnitude larger than those… 

Figures and Tables from this paper

Asynchronous Decentralized Distributed Training of Acoustic Models
TLDR
It is shown that ADPSGD with fixed and randomized communication patterns cope well with slow learners, and with the delay-by-one strategy has the fastest convergence with large batches.
DEEP RECURRENT NEURAL NETWORK BASED AUDIO SPEECH RECOGNITION SYSTEM
TLDR
This paper presents a unique approach for isolated word recognition based on deep learning models using Recurrent Neural Networks (RNNs) particularly, which can perform end to end speech recognition without any assumption of structure in data using Bidirectional LSTM (BiLSTM).
Spark-Based Parallel Deep Neural Network Model for Classification of Large Scale RNAs into piRNAs and Non-piRNAs
TLDR
A computational model based on parallel deep neural network for timely classification of large number of RNAs sequence into piRNAs and non-piRNAs, taking advantages of parallel and distributed computing platform is presented.
Efficient Keyword Spotting through Hardware-Aware Conditional Execution of Deep Neural Networks
TLDR
Results show the framework can generate cascade models optimized in function of the class distribution, reducing computational cost by 87% for always-on operation while maintaining the baseline accuracy of the most complex model of the cascade.
CAVBench: A Benchmark Suite for Connected and Autonomous Vehicles
TLDR
CAVBench is the first benchmark suite for the edge computing system in the CAVs scenario and provides quantitative evaluation results via application and system perspective output metrics and uses the CAVBench to evaluate a typical edge computing platform.
Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling
TLDR
This work proposes an algorithm that can find the best tiling to partition tensors with the least overall communication and builds the SoyBean system, which automatically transforms a serial dataflow graph captured by an existing deep learning system frontend into a parallel dataflowgraph based on the optimal tiling it has found.
Recent advances in LVCSR : A benchmark comparison of performances
TLDR
A benchmark comparison of the performances of the current state-of-the-art LVCSR systems over different speech recognition tasks shows that the Deep Neural Networks and Convolutional Neural Networks have proven their efficiency on several LV CSR tasks by outperforming the traditional Hidden Markov Models and Guaussian Mixture Models.
A novel path planning method for biomimetic robot based on deep learning
TLDR
A new method of deep learning based biomimetic robot path planning is proposed which includes max-pooling layer and convolutional kernel, and the deep neural network outperforms in dynamic and static environment than the conventional method.
Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
TLDR
A comparison between architectures shows that, even with a small database, hybrid DNN-HMM models outperform classical GMM-H MM according to word error rate measures.
...
1
2
...

References

SHOWING 1-10 OF 23 REFERENCES
Improving deep neural networks for LVCSR using rectified linear units and dropout
TLDR
Modelling deep neural networks with rectified linear unit (ReLU) non-linearities with minimal human hyper-parameter tuning on a 50-hour English Broadcast News task shows an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improved over a strong GMM/HMM system.
On rectified linear units for speech processing
TLDR
This work shows that it can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units.
Rectifier Nonlinearities Improve Neural Network Acoustic Models
TLDR
This work explores the use of deep rectifier networks as acoustic models for the 300 hour Switchboard conversational speech recognition task, and analyzes hidden layer representations to quantify differences in how ReL units encode inputs as compared to sigmoidal units.
Building high-level features using large scale unsupervised learning
TLDR
Contrary to what appears to be a widely-held intuition, the experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not.
Deep learning with COTS HPC systems
TLDR
This paper presents technical details and results from their own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI, and shows that it can scale to networks with over 11 billion parameters using just 16 machines.
Dropout Training as Adaptive Regularization
TLDR
By casting dropout as regularization, this work develops a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer and consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.
ICML
  • ICML
  • 2013
Improvements to Deep Convolutional Neural Networks for LVCSR
TLDR
A deep analysis comparing limited weight sharing and full weight sharing with state-of-the-art features is conducted and an effective strategy to use dropout during Hessian-free sequence training is introduced.
Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription
TLDR
Recent improvements to the original YouTube automatic generation of closed captions system are described, in particular the use of owner-uploaded video transcripts to generate additional semi-supervised training data and deep neural networks acoustic models with large state inventories.
Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets
TLDR
A low-rank matrix factorization of the final weight layer is proposed and applied to DNNs for both acoustic modeling and language modeling, showing an equivalent reduction in training time and a significant loss in final recognition accuracy compared to a full-rank representation.
...
1
2
3
...