• Corpus ID: 14092849

An Overview of Discriminative Training for Speech Recognition

@inproceedings{Vertanen2005AnOO,
  title={An Overview of Discriminative Training for Speech Recognition},
  author={Keith Vertanen},
  year={2005}
}
This paper gives an overview of discriminative training as it pertains to the speech recognition problem. The basic theory of discriminative training will be discussed and an explanation of maximum mutual information (MMI) given. Common problems inherent to discriminative training will be explored as well as practicalities associated with implementing discriminative training for large vocabulary recognition. Alternatives to the MMI objective function such as minimum word error (MWE) and minimum… 

Figures from this paper

Discriminative splitting of Gaussian/log-linear mixture HMMs for speech recognition
TLDR
This paper presents a method to incorporate mixture density splitting into the acoustic model discriminative log-linear training, and achieves large gains in the objective function and corresponding moderate losses in the word error rate on a large vocabulary corpus.
Simultaneous Discriminative Training and Mixture Splitting of HMMs for Speech Recognition
TLDR
This paper incorporates the state of the art minimum phone error training criterion into the framework, and shows that after discriminative splitting, a subsequent log-linear MPE training achieves better results than Gaussian mixture model MPE optimization alone.
Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling
TLDR
The results indicate that the use of RNN-based language modeling enhances the performance of the ASR system and the proposed system introduces the concept of speaker adaption using maximum likelihood linear regression technique.
Using Discriminative Training Techniques in Practical Intelligent Music Retrieval System
TLDR
A pop-song music retrieval system for telecom carriers to facilitate the interactions between the end users and the music database and model optimization techniques are considered.
Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling
TLDR
The proposed Hindi ASR system shows significant performance improvement over other current state-of-the-art techniques.
Maximum-Minimum Similarity Training for Text Extraction
TLDR
This paper uses the discriminative training criterion of maximum-minimum similarity (MMS) to improve the performance of text extraction based on Gaussian mixture modeling of neighbor characters and defines the corresponding objective function for text extraction.
Discriminative Training Using Noise Robust Integrated Features and Refined HMM Modeling
TLDR
The proposed work discusses the implementation of discriminatively trained Hindi ASR system using noise robust integrated features and refined HMM model and results show that discriminative training using MPE with MF-GFCC integrated feature vector and PSO-HMM parameter refinement gives significantly better results than the other implemented techniques.
A discriminated correlation classifier for face recognition
TLDR
A novel classifier called Discriminated Semi-Normalized Correlation (DSNC) classifier is proposed using the discriminative learning method, which needs only one intra-class sample and can be performed on open set face recognition problem.
Automatic Speech Recognition for Real Time Systems
TLDR
To train the ASR for MoD, this work experiment with the HMM-based classical approach and DeepSpeech2 on Voxforge dataset, and fine-tune the Deepspeech2 model on MoD data to achieve 14.727% Word Error Rate (WER).
Text Extraction Based on Maximum-Minimum Similarity Training Method
TLDR
A maximum-minimum similarity training algorithm to optimize the parameters in the effective method of text extraction based on Gaussian mixture modeling of neighbor characters and uses the gradient descent method to search the minimum of the objective function and the optimum parameters for the text extraction method.
...
...

References

SHOWING 1-10 OF 29 REFERENCES
Large scale discriminative training for speech recognition
TLDR
These experiments represent the largest-scale application of discriminative training techniques for speech recognition, and have led to significant reductions in word error rate for both triphone and quinphone HMMs compared to the best models trained using maximum likelihood estimation.
Discriminative training of language models for speech recognition
TLDR
This paper describes the algorithm and demonstrates modest improvements in word and sentence error rates on the DARPA Communicator task without any increase in language model complexity.
Maximum mutual information estimation of HMM parameters for continuous speech recognition using the N-best algorithm
  • Y. Chow
  • Computer Science
    International Conference on Acoustics, Speech, and Signal Processing
  • 1990
An application of discriminative training methods, maximum mutual information (MMI) training, to large-vocabulary continuous speech recognition is described. An algorithm is developed for efficient
Large scale discriminative training of hidden Markov models for speech recognition
TLDR
It is shown that HMMs trained with MMIE benefit as much as MLE-trained HMMs from applying model adaptation using maximum likelihood linear regression (MLLR), which has allowed the straightforward integration of MMIe- trained HMMs into complex multi-pass systems for transcription of conversational telephone speech.
INTERDEPENDENCE OF LANGUAGE MODELS AND DISCRIMINATIVE TRAINING
TLDR
A constrained recognition approach using word graphs is presented for the efficient determination of alternative word sequences for discriminative training and shows a significant dependence on the context length of the language model used for training.
On a model-robust training method for speech recognition
TLDR
For minimizing the decoding error rate of the (optimal) maximum a posteriori probability (MAP) decoder, it is shown that the CMLE (or maximum mutual information estimate, MMIE) may be preferable when the model is incorrect.
Minimum Phone Error and I-smoothing for improved discriminative training
  • Daniel Povey, P. Woodland
  • Computer Science
    2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 2002
TLDR
The Minimum Phone Error (MPE) and Minimum Word Error (MWE) criteria are smoothed approximations to the phone or word error rate respectively and I-smoothing which is a novel technique for smoothing discriminative training criteria using statistics for maximum likelihood estimation (MLE).
Minimum classification error rate methods for speech recognition
TLDR
The issue of speech recognizer training from a broad perspective with root in the classical Bayes decision theory is discussed, and the superiority of the minimum classification error (MCE) method over the distribution estimation method is shown by providing the results of several key speech recognition experiments.
A compact model for speaker-adaptive training
TLDR
A novel approach to estimating the parameters of continuous density HMMs for speaker-independent (SI) continuous speech recognition that jointly annihilates the inter-speaker variation and estimates the HMM parameters of the SI acoustic models.
...
...