Discriminative n-gram language modeling

@article{Roark2007DiscriminativeNL,
  title={Discriminative n-gram language modeling},
  author={Brian Roark and Murat Saraçlar and Michael Collins},
  journal={Comput. Speech Lang.},
  year={2007},
  volume={21},
  pages={373-392}
}
A Decade of Discriminative Language Modeling for Automatic Speech Recognition
TLDR
This paper summarizes the research on discriminative language modeling focusing on its application to automatic speech recognition (ASR) and generalizes DLM training by either using automatic transcriptions for the positive examples or simulating the negative examples.
Discriminative Syntactic Language Modeling for Speech Recognition
TLDR
A reranking model makes use of syntactic features together with a parameter estimation method that is based on the perception algorithm that provides an additional 0.3% reduction in test-set error rate beyond the model of (Roark et al., 2004a; Roark etAl., 2004b).
Discriminative training of n-gram language models for speech recognition via linear programming
TLDR
Experimental results on the SPINE1 speech recognition corpus have shown that the proposed discriminative training method can outperform the conventional discounting-based maximum likelihood estimation methods.
A Decade of Discriminative Language Modeling 13 2 Features In DLMs
TLDR
This paper summarizes the research on discriminative language modeling focusing on its application to automatic speech recognition (ASR) and generalizes DLM training by either using automatic transcriptions for the positive examples or simulating the negative examples.
Discriminative Language Model With Part-of-speech for Mandarin Large Vocabulary Continuous Speech Recognition System
TLDR
A discriminative training based language model (DLM) which directly focused on minimizing speech recognition word error rate (WER) was employed to improve the performance of speech recognition system and showed that DLM with n-gram features gave 1% absolute reduction in word error rates.
Large margin estimation of n-gram language models for speech recognition via linear programming
  • Vladimir Magdin, Hui Jiang
  • Computer Science
    2010 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2010
TLDR
Experimental results have shown that the proposed discriminative training method can outperform the conventional discounting-based maximum likelihood estimation methods on the SPINE1 speech recognition task.
Maximum mutual information multi-phone units in direct modeling
TLDR
A class of discriminative features for use in maximum entropy speech recognition models that are acoustic detectors for discriminatively determined multi-phone units, and define two novel classes of features based on these units: associative and transductive.
Deriving conversation-based features from unlabeled speech for discriminative language modeling
TLDR
It is shown that the perceptron algorithm can be successfully used in a semi-supervised learning (SSL) framework, where limited amounts of labeled data are available and the confidence “flows” from the unlabeled data to the labeled data, and not vice-versa, as is done traditionally in SSL.
Constrained discriminative training of N-gram language models
TLDR
This paper presents three techniques to improve the discriminative training of LMs, namely updating the back-off probability of unseen events, normalization of the N-gram updates to ensure a probability distribution and a relative-entropy based global constraint on theN-gram probability updates.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 58 REFERENCES
Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm
TLDR
This paper compares two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs), which have the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data.
Discriminative Syntactic Language Modeling for Speech Recognition
TLDR
A reranking model makes use of syntactic features together with a parameter estimation method that is based on the perception algorithm that provides an additional 0.3% reduction in test-set error rate beyond the model of (Roark et al., 2004a; Roark etAl., 2004b).
Finding consensus in speech recognition: word error minimization and other applications of confusion networks
We describe a new framework for distilling information from word lattices to improve the accuracy of the speech recognition output and obtain a more perspicuous representation of a set of alternative
Weighted finite-state transducers in speech recognition
TLDR
WFSTs provide a common and natural representation for hidden Markov models (HMMs), context-dependency, pronunciation dictionaries, grammars, and alternative recognition outputs, and general transducer operations combine these representations flexibly and efficiently.
Error corrective mechanisms for speech recognition
  • L. Mangu, M. Padmanabhan
  • Computer Science
    2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221)
  • 2001
TLDR
The paper uses transformation-based learning for inducing a set of rules to guide a better decision between the top two candidates with the highest posterior probabilities in each confusion set, and shows significant improvements over the consensus decoding approach.
Generalized Algorithms for Constructing Statistical Language Models
TLDR
An algorithm for computing efficiently the expected counts of any sequence in a word lattice output by a speech recognizer or any arbitrary weighted automaton is given and a new technique for creating exact representations of n-gram language models by weighted automata is described.
Large scale discriminative training for speech recognition
TLDR
These experiments represent the largest-scale application of discriminative training techniques for speech recognition, and have led to significant reductions in word error rate for both triphone and quinphone HMMs compared to the best models trained using maximum likelihood estimation.
Discriminative training on language model
TLDR
This paper proposed a discriminative training method to minimize the error rate of recognizer rather than estimate the distribution of training data, which gets approximately 5%-25% recognition error reduction with discrim inative training on language model building.
Discriminative training of language models for speech recognition
TLDR
This paper describes the algorithm and demonstrates modest improvements in word and sentence error rates on the DARPA Communicator task without any increase in language model complexity.
Whole-sentence exponential language models: a vehicle for linguistic-statistical integration
TLDR
An exponential language model which models a whole sentence or utterance as a single unit is introduced, and a novel procedure for feature selection is presented, which exploits discrepancies between the existing model and the training corpus.
...
1
2
3
4
5
...