Tandem connectionist feature extraction for conventional HMM systems
@article{Hermansky2000TandemCF, title={Tandem connectionist feature extraction for conventional HMM systems}, author={Hynek Hermansky and Daniel P. W. Ellis and Sangita Sharma}, journal={2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)}, year={2000}, volume={3}, pages={1635-1638 vol.3} }
Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks to estimate the probability distribution among subword units given the acoustic observations. In this work we show a large improvement in word recognition performance by combining neural-net discriminative…
805 Citations
Deep Neural Networks for Acoustic Modeling in Speech Recognition
- Computer Science
- 2012
This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Tandem HMM with convolutional neural network for handwritten word recognition
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
The convolutional neural networks can replace Gaussian mixtures to compute emission probabilities in hidden Markov models (hybrid combination), or serve as feature extractor for a standard Gaussian HMM system (tandem combination).
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
- Computer Science2009 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2009
This paper demonstrates that neural-network acoustic models can be trained with sequence classification criteria using exactly the same lattice-based methods that have been developed for Gaussian mixture HMMs, and that using a sequence classification criterion in training leads to considerably better performance.
Dirichlet Mixture Models of neural net posteriors for HMM-based speech recognition
- Computer Science2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2011
A novel technique for modeling the posterior probability estimates obtained from a neural network directly in the HMM framework using the Dirichlet Mixture Models (DMMs), which outperforms the conventional TANDEM approach.
Hybrid architectures for speech recognition
- Computer Science
- 2011
The objective of this research is to develop hybrid architectures to improve the accuracy of ASR systems over the baseline HMM/GMM architecture.
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
- Computer ScienceIEEE Signal Processing Magazine
- 2012
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Noise Robust Speech Recognition Using Deep Belief Networks
- Computer ScienceInt. J. Comput. Intell. Appl.
- 2016
Deep Belief Networks (DBNs) are used to extract discriminative information from larger window of frames in GMMs and indicate that this new method of feature encoding result in much better word recognition accuracy.
Dimensionality reduction methods for HMM phonetic recognition
- Computer Science2010 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2010
Two nonlinear feature dimensionality reduction methods based on neural networks for a HMM-based phone recognition system are presented and it is shown that recognition accuracies with the transformed features are slightly higher than those obtained with original features and considerably higher than obtained with linear dimensionality Reduction methods.
Deep Neural Network-Hidden Markov Model Hybrid Systems
- Computer Science
- 2015
The architecture and the training procedure of the DNN-HMM hybrid system are described and the key components of such systems are pointed out by comparing a range of system setups.
Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System
- Computer ScienceACCV Workshops
- 2016
This work shows that the proposed method has outperformed the baseline techniques applied to the OuluVS2 audiovisual database for phrase recognition with the frontal view cross-validation and testing sentence correctness reaching 79% and 73%, respectively, as compared to the baseline of 74% on cross- validation.
References
SHOWING 1-10 OF 21 REFERENCES
A NN/HMM hybrid for continuous speech recognition with a discriminant nonlinear feature extraction
- Computer ScienceProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)
- 1998
A novel approach to set up a neural linear or nonlinear feature transformation that is used as a preprocessor on top of the HMM system's RBF-network to produce discriminative feature vectors that are well suited for being modeled by mixtures of Gaussian distributions.
Connectionist Speech Recognition: A Hybrid Approach
- Computer Science
- 1993
From the Publisher:
Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous…
Feature extraction using non-linear transformation for robust speech recognition on the Aurora database
- Computer Science2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
- 2000
It is shown that after a non-linear transformation, a number of features can be effectively used in a HMM-based recognition system.
Global optimization of a neural network-hidden Markov model hybrid
- Computer ScienceIEEE Trans. Neural Networks
- 1992
An original method for integrating artificial neural networks (ANN) with hidden Markov models (HMM) with results on speaker-independent recognition experiments using this integrated ANN-HMM system on the TIMIT continuous speech database are reported.
Data-Derived Non-Linear Mapping for Feature Extraction in HMM
- Computer Science
- 1999
Rather long temporal trajectory of critical band logarithmic power spectrum energy at a given frequency is used as an input feature vector in a MLP-based phoneme classi er, trained on a…
Speech/music discrimination based on posterior probability features
- Computer ScienceEUROSPEECH
- 1999
Acoustic confidence measures in connectionist speech recognition, Ph.D. thesis, Dept. of Computer Science, Univ. of Sheffield, 1999.
Perceptual linear predictive (PLP) analysis of speech.
- PhysicsThe Journal of the Acoustical Society of America
- 1990
A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
Multi-stream speech recognition: ready for prime time?
- Computer ScienceEUROSPEECH
- 1999
This paper found that multi-stream systems using different acoustic front-ends provide a significant improvement over single stream systems, however, despite the fact that they have been successful on smaller tasks, they have not yet been able to show any improvement using multiband methods.
Recognizing reverberant speech with RASTA-PLP
- Engineering, Physics1997 IEEE International Conference on Acoustics, Speech, and Signal Processing
- 1997
The authors' experimental variant on RASTA processing provides a statistically significant improvement in performance on the reverberant speech, with a best word error rate of 64.1%.
Perceptually inspired signal processing strategies for robust speech recognition in reverberant environments
- Physics
- 1998
This work presentsceptually Inspired Signal-processing Strategies for Robust Speech Recognition in Reverberant Environments, a novel approach to signal-processing that automates the very labor-intensive and therefore time-heavy and expensive process of recognizing speech.