Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets
@article{Sainath2013LowrankMF, title={Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets}, author={Tara N. Sainath and Brian Kingsbury and Vikas Sindhwani and Ebru Arisoy and Bhuvana Ramabhadran}, journal={2013 IEEE International Conference on Acoustics, Speech and Signal Processing}, year={2013}, pages={6655-6659} }
While Deep Neural Networks (DNNs) have achieved tremendous success for large vocabulary continuous speech recognition (LVCSR) tasks, training of these networks is slow. One reason is that DNNs are trained with a large number of training parameters (i.e., 10-50 million). Because networks are trained with a large number of output targets to achieve good performance, the majority of these parameters are in the final weight layer. In this paper, we propose a low-rank matrix factorization of the…
482 Citations
Sequence discriminative training for low-rank deep neural networks
- Computer Science2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP)
- 2014
Low rank approximation is effective for noisy speech and the most effective combination of discriminative training with model reduction is to apply the low rank approximation to the base model first and then to perform discrim inative training on the low-rank model.
Multi-lingual speech recognition with low-rank multi-task deep neural networks
- Computer Science2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2015
The LRMF proposed in this work for MTL, is for the original languagespecific block matrices to “share” a common matrix, with resulting low-rank language specific block Matrices resulting in competitive performance compared to a full-rank multi-lingual DNN.
Low-Rank Representation For Enhanced Deep Neural Network Acoustic Models
- Computer Science
- 2016
Experiments demonstrate thatLow-rank representation can enhance posterior probability estimation, and lead to higher ASR accuracy, and a novel hashing technique is proposed exploiting the low-rank property of posterior subspaces that enables fast search in the space of posterior exemplars.
Parameter Reduction For Deep Neural Network Based Acoustic Models Using Sparsity Regularized Factorization Neurons
- Computer Science2019 International Joint Conference on Neural Networks (IJCNN)
- 2019
An approach that performs model parameter reduction simultaneously during model training from the aspect of minimizing classification error is proposed, which uses the product of three factorized matrices instead of a dense weight matrix, and applies sparsity constraint to make entries of the center diagonal matrix zero.
Low-Rank Representation For Enhanced Deep Neural Network Acoustic Models M aster Project Report
- Computer Science
- 2016
Experiments demonstrate thatLow-rank representation can enhance posterior probability estimation, and lead to higher ASR accuracy, and a novel hashing technique is proposed exploiting the low-rank property of posterior subspaces that enables fast search in the space of posterior exemplars.
Small-footprint high-performance deep neural network-based speech recognition using split-VQ
- Computer Science2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2015
This work proposes to split each row vector of weight matrices into sub-vectors, and quantize them into a set of codewords using a split vector quantization (split-VQ) algorithm, and demonstrates that this method can further reduce the model size and save 10% to 50% computation on top of an already very compact SVD-DNN without a noticeable performance degradation.
A study of rank-constrained multilingual DNNS for low-resource ASR
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
It is shown that properly applying low-rank factorization (LRF) of weight matrices via Singular Value Decomposition (SVD) to sparsify a multilingual DNN can improve recognition accuracy for multiple low-resource ASR configurations.
Exploiting low-dimensional structures to enhance DNN based acoustic modeling in speech recognition
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
The enhanced acoustic modeling method leads to improvements in continuous speech recognition task using hybrid DNN-HMM (hidden Markov model) framework in both clean and noisy conditions, where upto 15.4% relative reduction in word error rate (WER) is achieved.
Recurrent Neural Network Compression Based on Low-Rank Tensor Representation
- Computer ScienceIEICE Trans. Inf. Syst.
- 2020
This work evaluates all tensor-based RNNs performance on sequence modeling tasks with a various number of parameters and proposes a proposed TT-GRU with speech recognition task that preserves the performance while reducing the number of GRU parameters significantly compared to the uncompressed GRU.
References
SHOWING 1-10 OF 21 REFERENCES
Making Deep Belief Networks effective for large vocabulary continuous speech recognition
- Computer Science2011 IEEE Workshop on Automatic Speech Recognition & Understanding
- 2011
This paper explores the performance of DBNs in a state-of-the-art LVCSR system, showing improvements over Multi-Layer Perceptrons (MLPs) and GMM/HMMs across a variety of features on an English Broadcast News task.
Deep Neural Network Language Models
- Computer ScienceWLM@NAACL-HLT
- 2012
Results on a Wall Street Journal (WSJ) task demonstrate that DNN LMs offer improvements over a single hidden layer NNLM, and are competitive with a model M language model, considered to be one of the current state-of-the-art techniques for language modeling.
Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine
- Computer Science, PhysicsNIPS
- 2010
This work uses the mean-covariance restricted Boltzmann machine (mcRBM) to learn features of speech data that serve as input into a standard DBN, and achieves a phone error rate superior to all published results on speaker-independent TIMIT to date.
Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization
- Computer ScienceINTERSPEECH
- 2012
A distributed neural network training algorithm, based on Hessianfree optimization, that scales to deep networks and large data sets and yields relative reductions in word error rate of 7–13% over cross-entropy training with stochastic gradient descent on two larger tasks: Switchboard and DARPA RATS noisy Levantine Arabic.
Deep convolutional neural networks for LVCSR
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
This paper determines the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks, and explores the behavior of neural network features extracted from CNNs on a variety of LVCSS tasks, comparing CNNs toDNNs and GMMs.
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
- Computer ScienceIEEE Signal Processing Magazine
- 2012
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Deep Neural Networks for Acoustic Modeling in Speech Recognition
- Computer Science
- 2012
This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
- Computer Science2009 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2009
This paper demonstrates that neural-network acoustic models can be trained with sequence classification criteria using exactly the same lattice-based methods that have been developed for Gaussian mixture HMMs, and that using a sequence classification criterion in training leads to considerably better performance.
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription
- Computer Science2011 IEEE Workshop on Automatic Speech Recognition & Understanding
- 2011
This work investigates the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective to reduce the word error rate for speaker-independent transcription of phone calls.
Connectionist Speech Recognition: A Hybrid Approach
- Computer Science
- 1993
From the Publisher:
Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous…