Tandem connectionist feature extraction for conventional HMM systems

  title={Tandem connectionist feature extraction for conventional HMM systems},
  author={Hynek Hermansky and Daniel P. W. Ellis and Sangita Sharma},
  journal={2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)},
  pages={1635-1638 vol.3}
  • H. Hermansky, D. Ellis, Sangita Sharma
  • Published 5 June 2000
  • Computer Science
  • 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks to estimate the probability distribution among subword units given the acoustic observations. In this work we show a large improvement in word recognition performance by combining neural-net discriminative… 

Figures and Tables from this paper

Deep Neural Networks for Acoustic Modeling in Speech Recognition
This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Tandem HMM with convolutional neural network for handwritten word recognition
The convolutional neural networks can replace Gaussian mixtures to compute emission probabilities in hidden Markov models (hybrid combination), or serve as feature extractor for a standard Gaussian HMM system (tandem combination).
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
  • Brian Kingsbury
  • Computer Science
    2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2009
This paper demonstrates that neural-network acoustic models can be trained with sequence classification criteria using exactly the same lattice-based methods that have been developed for Gaussian mixture HMMs, and that using a sequence classification criterion in training leads to considerably better performance.
Dirichlet Mixture Models of neural net posteriors for HMM-based speech recognition
A novel technique for modeling the posterior probability estimates obtained from a neural network directly in the HMM framework using the Dirichlet Mixture Models (DMMs), which outperforms the conventional TANDEM approach.
Hybrid architectures for speech recognition
  • Computer Science
  • 2011
The objective of this research is to develop hybrid architectures to improve the accuracy of ASR systems over the baseline HMM/GMM architecture.
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Noise Robust Speech Recognition Using Deep Belief Networks
Deep Belief Networks (DBNs) are used to extract discriminative information from larger window of frames in GMMs and indicate that this new method of feature encoding result in much better word recognition accuracy.
Dimensionality reduction methods for HMM phonetic recognition
  • Hongbing Hu, S. Zahorian
  • Computer Science
    2010 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2010
Two nonlinear feature dimensionality reduction methods based on neural networks for a HMM-based phone recognition system are presented and it is shown that recognition accuracies with the transformed features are slightly higher than those obtained with original features and considerably higher than obtained with linear dimensionality Reduction methods.
Deep Neural Network-Hidden Markov Model Hybrid Systems
The architecture and the training procedure of the DNN-HMM hybrid system are described and the key components of such systems are pointed out by comparing a range of system setups.
Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System
This work shows that the proposed method has outperformed the baseline techniques applied to the OuluVS2 audiovisual database for phrase recognition with the frontal view cross-validation and testing sentence correctness reaching 79% and 73%, respectively, as compared to the baseline of 74% on cross- validation.


A NN/HMM hybrid for continuous speech recognition with a discriminant nonlinear feature extraction
  • G. Rigoll, D. Willett
  • Computer Science
    Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)
  • 1998
A novel approach to set up a neural linear or nonlinear feature transformation that is used as a preprocessor on top of the HMM system's RBF-network to produce discriminative feature vectors that are well suited for being modeled by mixtures of Gaussian distributions.
Connectionist Speech Recognition: A Hybrid Approach
From the Publisher: Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous
Feature extraction using non-linear transformation for robust speech recognition on the Aurora database
It is shown that after a non-linear transformation, a number of features can be effectively used in a HMM-based recognition system.
Global optimization of a neural network-hidden Markov model hybrid
An original method for integrating artificial neural networks (ANN) with hidden Markov models (HMM) with results on speaker-independent recognition experiments using this integrated ANN-HMM system on the TIMIT continuous speech database are reported.
Data-Derived Non-Linear Mapping for Feature Extraction in HMM
Rather long temporal trajectory of critical band logarithmic power spectrum energy at a given frequency is used as an input feature vector in a MLP-based phoneme classi er, trained on a
Speech/music discrimination based on posterior probability features
Acoustic confidence measures in connectionist speech recognition, Ph.D. thesis, Dept. of Computer Science, Univ. of Sheffield, 1999.
Perceptual linear predictive (PLP) analysis of speech.
  • H. Hermansky
  • Physics
    The Journal of the Acoustical Society of America
  • 1990
A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
Multi-stream speech recognition: ready for prime time?
This paper found that multi-stream systems using different acoustic front-ends provide a significant improvement over single stream systems, however, despite the fact that they have been successful on smaller tasks, they have not yet been able to show any improvement using multiband methods.
Recognizing reverberant speech with RASTA-PLP
  • Brian Kingsbury, N. Morgan
  • Engineering, Physics
    1997 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 1997
The authors' experimental variant on RASTA processing provides a statistically significant improvement in performance on the reverberant speech, with a best word error rate of 64.1%.
Perceptually inspired signal processing strategies for robust speech recognition in reverberant environments
This work presentsceptually Inspired Signal-processing Strategies for Robust Speech Recognition in Reverberant Environments, a novel approach to signal-processing that automates the very labor-intensive and therefore time-heavy and expensive process of recognizing speech.