Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition
@article{Maas2014IncreasingDN, title={Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition}, author={Andrew L. Maas and Awni Y. Hannun and Christopher T. Lengerich and Peng Qi and Dan Jurafsky and A. Ng}, journal={ArXiv}, year={2014}, volume={abs/1406.7806} }
Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Part of the promise of DNNs is their ability to represent increasingly complex functions as the number of DNN parameters increases. This paper investigates the performance of DNNbased hybrid speech recognition systems as DNN model size and training data increase. Using a distributed GPU architecture, we train DNN acoustic models roughly an order of magnitude larger than those…
No Paper Link Available
19 Citations
Asynchronous Decentralized Distributed Training of Acoustic Models
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2021
It is shown that ADPSGD with fixed and randomized communication patterns cope well with slow learners, and with the delay-by-one strategy has the fastest convergence with large batches.
DEEP RECURRENT NEURAL NETWORK BASED AUDIO SPEECH RECOGNITION SYSTEM
- Computer Science
- 2021
This paper presents a unique approach for isolated word recognition based on deep learning models using Recurrent Neural Networks (RNNs) particularly, which can perform end to end speech recognition without any assumption of structure in data using Bidirectional LSTM (BiLSTM).
Spark-Based Parallel Deep Neural Network Model for Classification of Large Scale RNAs into piRNAs and Non-piRNAs
- Computer Science, BiologyIEEE Access
- 2020
A computational model based on parallel deep neural network for timely classification of large number of RNAs sequence into piRNAs and non-piRNAs, taking advantages of parallel and distributed computing platform is presented.
Efficient Keyword Spotting through Hardware-Aware Conditional Execution of Deep Neural Networks
- Computer Science2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA)
- 2019
Results show the framework can generate cascade models optimized in function of the class distribution, reducing computational cost by 87% for always-on operation while maintaining the baseline accuracy of the most complex model of the cascade.
CAVBench: A Benchmark Suite for Connected and Autonomous Vehicles
- Computer Science2018 IEEE/ACM Symposium on Edge Computing (SEC)
- 2018
CAVBench is the first benchmark suite for the edge computing system in the CAVs scenario and provides quantitative evaluation results via application and system perspective output metrics and uses the CAVBench to evaluate a typical edge computing platform.
Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling
- Computer ScienceArXiv
- 2018
This work proposes an algorithm that can find the best tiling to partition tensors with the least overall communication and builds the SoyBean system, which automatically transforms a serial dataflow graph captured by an existing deep learning system frontend into a parallel dataflowgraph based on the optimal tiling it has found.
Cross database audio visual speech adaptation for phonetic spoken term detection
- Computer ScienceComput. Speech Lang.
- 2017
Recent advances in LVCSR : A benchmark comparison of performances
- Computer Science
- 2017
A benchmark comparison of the performances of the current state-of-the-art LVCSR systems over different speech recognition tasks shows that the Deep Neural Networks and Convolutional Neural Networks have proven their efficiency on several LV CSR tasks by outperforming the traditional Hidden Markov Models and Guaussian Mixture Models.
A novel path planning method for biomimetic robot based on deep learning
- Computer Science
- 2016
A new method of deep learning based biomimetic robot path planning is proposed which includes max-pooling layer and convolutional kernel, and the deep neural network outperforms in dynamic and static environment than the conventional method.
Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
- Computer ScienceIberSPEECH
- 2016
A comparison between architectures shows that, even with a small database, hybrid DNN-HMM models outperform classical GMM-H MM according to word error rate measures.
References
SHOWING 1-10 OF 23 REFERENCES
Improving deep neural networks for LVCSR using rectified linear units and dropout
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
Modelling deep neural networks with rectified linear unit (ReLU) non-linearities with minimal human hyper-parameter tuning on a 50-hour English Broadcast News task shows an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improved over a strong GMM/HMM system.
On rectified linear units for speech processing
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
This work shows that it can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units.
Rectifier Nonlinearities Improve Neural Network Acoustic Models
- Computer Science
- 2013
This work explores the use of deep rectifier networks as acoustic models for the 300 hour Switchboard conversational speech recognition task, and analyzes hidden layer representations to quantify differences in how ReL units encode inputs as compared to sigmoidal units.
Building high-level features using large scale unsupervised learning
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
Contrary to what appears to be a widely-held intuition, the experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not.
Deep learning with COTS HPC systems
- Computer ScienceICML
- 2013
This paper presents technical details and results from their own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI, and shows that it can scale to networks with over 11 billion parameters using just 16 machines.
Dropout Training as Adaptive Regularization
- Computer ScienceNIPS
- 2013
By casting dropout as regularization, this work develops a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer and consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.
ICML
- ICML
- 2013
Improvements to Deep Convolutional Neural Networks for LVCSR
- Computer Science2013 IEEE Workshop on Automatic Speech Recognition and Understanding
- 2013
A deep analysis comparing limited weight sharing and full weight sharing with state-of-the-art features is conducted and an effective strategy to use dropout during Hessian-free sequence training is introduced.
Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription
- Computer Science2013 IEEE Workshop on Automatic Speech Recognition and Understanding
- 2013
Recent improvements to the original YouTube automatic generation of closed captions system are described, in particular the use of owner-uploaded video transcripts to generate additional semi-supervised training data and deep neural networks acoustic models with large state inventories.
Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
A low-rank matrix factorization of the final weight layer is proposed and applied to DNNs for both acoustic modeling and language modeling, showing an equivalent reduction in training time and a significant loss in final recognition accuracy compared to a full-rank representation.