Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets

@article{Sainath2013LowrankMF,
  title={Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets},
  author={Tara N. Sainath and Brian Kingsbury and Vikas Sindhwani and Ebru Arisoy and Bhuvana Ramabhadran},
  journal={2013 IEEE International Conference on Acoustics, Speech and Signal Processing},
  year={2013},
  pages={6655-6659}
}
While Deep Neural Networks (DNNs) have achieved tremendous success for large vocabulary continuous speech recognition (LVCSR) tasks, training of these networks is slow. One reason is that DNNs are trained with a large number of training parameters (i.e., 10-50 million). Because networks are trained with a large number of output targets to achieve good performance, the majority of these parameters are in the final weight layer. In this paper, we propose a low-rank matrix factorization of the… 

Figures and Tables from this paper

Sequence discriminative training for low-rank deep neural networks
TLDR
Low rank approximation is effective for noisy speech and the most effective combination of discriminative training with model reduction is to apply the low rank approximation to the base model first and then to perform discrim inative training on the low-rank model.
Multi-lingual speech recognition with low-rank multi-task deep neural networks
  • Aanchan Mohan, R. Rose
  • Computer Science
    2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
TLDR
The LRMF proposed in this work for MTL, is for the original languagespecific block matrices to “share” a common matrix, with resulting low-rank language specific block Matrices resulting in competitive performance compared to a full-rank multi-lingual DNN.
Low-Rank Representation For Enhanced Deep Neural Network Acoustic Models
TLDR
Experiments demonstrate thatLow-rank representation can enhance posterior probability estimation, and lead to higher ASR accuracy, and a novel hashing technique is proposed exploiting the low-rank property of posterior subspaces that enables fast search in the space of posterior exemplars.
Parameter Reduction For Deep Neural Network Based Acoustic Models Using Sparsity Regularized Factorization Neurons
TLDR
An approach that performs model parameter reduction simultaneously during model training from the aspect of minimizing classification error is proposed, which uses the product of three factorized matrices instead of a dense weight matrix, and applies sparsity constraint to make entries of the center diagonal matrix zero.
Low-Rank Representation For Enhanced Deep Neural Network Acoustic Models M aster Project Report
TLDR
Experiments demonstrate thatLow-rank representation can enhance posterior probability estimation, and lead to higher ASR accuracy, and a novel hashing technique is proposed exploiting the low-rank property of posterior subspaces that enables fast search in the space of posterior exemplars.
Small-footprint high-performance deep neural network-based speech recognition using split-VQ
TLDR
This work proposes to split each row vector of weight matrices into sub-vectors, and quantize them into a set of codewords using a split vector quantization (split-VQ) algorithm, and demonstrates that this method can further reduce the model size and save 10% to 50% computation on top of an already very compact SVD-DNN without a noticeable performance degradation.
A study of rank-constrained multilingual DNNS for low-resource ASR
TLDR
It is shown that properly applying low-rank factorization (LRF) of weight matrices via Singular Value Decomposition (SVD) to sparsify a multilingual DNN can improve recognition accuracy for multiple low-resource ASR configurations.
Exploiting low-dimensional structures to enhance DNN based acoustic modeling in speech recognition
TLDR
The enhanced acoustic modeling method leads to improvements in continuous speech recognition task using hybrid DNN-HMM (hidden Markov model) framework in both clean and noisy conditions, where upto 15.4% relative reduction in word error rate (WER) is achieved.
Recurrent Neural Network Compression Based on Low-Rank Tensor Representation
TLDR
This work evaluates all tensor-based RNNs performance on sequence modeling tasks with a various number of parameters and proposes a proposed TT-GRU with speech recognition task that preserves the performance while reducing the number of GRU parameters significantly compared to the uncompressed GRU.
...
...

References

SHOWING 1-10 OF 21 REFERENCES
Making Deep Belief Networks effective for large vocabulary continuous speech recognition
TLDR
This paper explores the performance of DBNs in a state-of-the-art LVCSR system, showing improvements over Multi-Layer Perceptrons (MLPs) and GMM/HMMs across a variety of features on an English Broadcast News task.
Deep Neural Network Language Models
TLDR
Results on a Wall Street Journal (WSJ) task demonstrate that DNN LMs offer improvements over a single hidden layer NNLM, and are competitive with a model M language model, considered to be one of the current state-of-the-art techniques for language modeling.
Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine
TLDR
This work uses the mean-covariance restricted Boltzmann machine (mcRBM) to learn features of speech data that serve as input into a standard DBN, and achieves a phone error rate superior to all published results on speaker-independent TIMIT to date.
Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization
TLDR
A distributed neural network training algorithm, based on Hessianfree optimization, that scales to deep networks and large data sets and yields relative reductions in word error rate of 7–13% over cross-entropy training with stochastic gradient descent on two larger tasks: Switchboard and DARPA RATS noisy Levantine Arabic.
Deep convolutional neural networks for LVCSR
TLDR
This paper determines the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks, and explores the behavior of neural network features extracted from CNNs on a variety of LVCSS tasks, comparing CNNs toDNNs and GMMs.
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
TLDR
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Deep Neural Networks for Acoustic Modeling in Speech Recognition
TLDR
This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
  • Brian Kingsbury
  • Computer Science
    2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2009
TLDR
This paper demonstrates that neural-network acoustic models can be trained with sequence classification criteria using exactly the same lattice-based methods that have been developed for Gaussian mixture HMMs, and that using a sequence classification criterion in training leads to considerably better performance.
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription
TLDR
This work investigates the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective to reduce the word error rate for speaker-independent transcription of phone calls.
Connectionist Speech Recognition: A Hybrid Approach
From the Publisher: Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous
...
...