• Corpus ID: 6688465

Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks.

@article{Yu2013FeatureLI,
  title={Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks.},
  author={Dong Yu and Michael L. Seltzer and Jinyu Li and Jui Ting Huang and Frank Seide},
  journal={arXiv: Learning},
  year={2013}
}
Recent studies have shown that deep neural networks (DNNs) perform significantly better than shallow networks and Gaussian mixture models (GMMs) on large vocabulary speech recognition tasks. In this paper, we argue that the improved accuracy achieved by the DNNs is the result of their ability to extract discriminative internal representations that are robust to the many sources of variability in speech signals. We show that these representations become increasingly insensitive to small… 

Figures and Tables from this paper

Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions
TLDR
This work compares the baseline mel-filterbank energies with noise-robust features that have been proposed earlier and shows that the use of robust features helps to improve the performance of DNNs or CNNs compared to melfilterbank energies and also shows that vocal tract length normalization has a positive role in improving theperformance of the robust acoustic features.
Convolutional Neural Networks for Speech Recognition
TLDR
It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.
Robust Features in Deep-Learning-Based Speech Recognition
TLDR
This work uses robust features to create an invariant representation of the acoustic space, and leverages knowledge from auditory neuroscience and psychoacoustics, by using robust features inspired by auditory perception.
Noisy training for deep neural networks in speech recognition
TLDR
The experiments presented in this paper confirm that the noisy training approach works well for the DNN model and can provide substantial performance improvement for DNN-based speech recognition.
Improving robustness of deep neural networks via spectral masking for automatic speech recognition
  • Bo Li, K. Sim
  • Computer Science
    2013 IEEE Workshop on Automatic Speech Recognition and Understanding
  • 2013
TLDR
This work investigates two mask estimation approaches, namely the state dependent and the deep neural network (DNN) based estimations, to separate speech from noises for improving DNN acoustic models' noise robustness.
Building DNN acoustic models for large vocabulary speech recognition
Noisy training for deep neural networks
TLDR
The experiments presented in this paper confirm that the original assumptions of the noise injection approach largely holds when learning deep structures, and the noisy training may provide substantial performance improvement for DNN-based speech recognition.
Deep Neural Network Based Speech Recognition Systems Under Noise Perturbations
TLDR
This work investigates the capability of noise immunity in various neural network models through the speech recognition task and demonstrates that the phoneme error rate (PER) degrades as the signal-to-noise ratio (SNR) reduces across all evaluated neural network Models.
Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training
TLDR
A supervised speech separation system that significantly improves automatic speech recognition (ASR) performance in realistic noise conditions is presented and a framework that unifies separation and acoustic modeling via joint adaptive training is proposed.
Feature mapping using deep belief networks for robust speech recognition
TLDR
This paper proposes to use deep belief network (DBN) as a postprocessing method for de-noising Mel frequency cepstral coefficients (MFCCs) and uses it for extracting tandem features from denoised MFCCs to obtain more robust and discriminative features.
...
...

References

SHOWING 1-10 OF 26 REFERENCES
Exploiting sparseness in deep neural networks for large vocabulary speech recognition
TLDR
The goal of enforcing sparseness as soft regularization and convex constraint optimization problems is formulated, solutions under the stochastic gradient ascent setting are proposed, and novel data structures are proposed to exploit the randomSparseness patterns to reduce model size and computation time.
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
TLDR
A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM
TLDR
This paper presents the strategy of using mixed-bandwidth training data to improve wideband speech recognition accuracy in the CD-DNN-HMM framework, and shows that DNNs provide the flexibility of using arbitrary features.
Making Deep Belief Networks effective for large vocabulary continuous speech recognition
TLDR
This paper explores the performance of DBNs in a state-of-the-art LVCSR system, showing improvements over Multi-Layer Perceptrons (MLPs) and GMM/HMMs across a variety of features on an English Broadcast News task.
Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition
TLDR
It is shown that pre-training can initialize weights to a point in the space where fine-tuning can be effective and thus is crucial in training deep structured models and in the recognition performance of a CD-DBN-HMM based large-vocabulary speech recognizer.
Noise Adaptive Training for Robust Automatic Speech Recognition
TLDR
A noise adaptive training (NAT) algorithm that can be applied to all training data that normalizes the environmental distortion as part of the model training that later is used with vector Taylor series model adaptation for decoding noisy utterances at test time.
Adaptive Training with Joint Uncertainty Decoding for Robust Recognition of Noisy Data
  • H. Liao, M. Gales
  • Computer Science
    2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07
  • 2007
TLDR
Joint adaptive training is presented including formula for estimating the transforms and canonical model parameters and results show that multistyle models benefit from VTS compensation or joint uncertainty decoding by reducing the mismatch between training and test.
Speaker and Noise Factorization for Robust Speech Recognition
TLDR
An acoustic factorization approach is adopted that allows the speaker characteristics obtained in one noise condition to be applied to a different environment and modified versions of MLLR and VTS training and application are derived.
Large vocabulary continuous speech recognition with context-dependent DBN-HMMS
TLDR
This work proposes a context-dependent DBN-HMM system that dramatically outperforms strong Gaussian mixture model (GMM)-HMM baselines on a challenging, large vocabulary, spontaneous speech recognition dataset from the Bing mobile voice search task.
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription
TLDR
This work investigates the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective to reduce the word error rate for speaker-independent transcription of phone calls.
...
...