Finding consensus in speech recognition: word error minimization and other applications of confusion networks

@article{Mangu2000FindingCI,
  title={Finding consensus in speech recognition: word error minimization and other applications of confusion networks},
  author={L. Mangu and E. Brill and A. Stolcke},
  journal={Comput. Speech Lang.},
  year={2000},
  volume={14},
  pages={373-400}
}
We describe a new framework for distilling information from word lattices to improve the accuracy of the speech recognition output and obtain a more perspicuous representation of a set of alternative hypotheses. In the standard MAP decoding approach the recognizer outputs the string of words corresponding to the path with the highest posterior probability given the acoustics and a language model. However, even given optimal models, the MAP decoder does not necessarily minimize the commonly used… Expand
Error corrective mechanisms for speech recognition
  • L. Mangu, M. Padmanabhan
  • Computer Science
  • 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221)
  • 2001
TLDR
The paper uses transformation-based learning for inducing a set of rules to guide a better decision between the top two candidates with the highest posterior probabilities in each confusion set, and shows significant improvements over the consensus decoding approach. Expand
Semantic parsing using word confusion networks with conditional random fields
TLDR
This paper proposes to exploit word confusion networks (WCNs), compiled from ASR lattices for both CRF modeling and decoding, which provide a compact representation of multiple aligned ASR hypotheses, without compromising recognition accuracy. Expand
Conditional use of word lattices, confusion networks and 1-best string hypotheses in a sequential interpretation strategy
TLDR
This study presents a new interpretation strategy based on the sequential use of different ASR output representations: 1-best strings, word lattices and confusion networks, which significantly reduces the size of the CN obtained while improving the recognition performance. Expand
Spoken Document Clustering Using Word Confusion Networks
TLDR
A word confusion network (WCN) based approach to perform clustering of the spoken documents and analyze its ability to handle the influence of speech recognition errors, showing upto 4% absolute improvement in normalized mutual information metric. Expand
An Evaluation of Lattice Scoring using a Smoothed Estimate of Word Accuracy
  • M. Omar, L. Mangu
  • Computer Science
  • 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07
  • 2007
TLDR
It is shown in the paper that two algorithms similar to the Viterbi and the forward-backward algorithms can be used to estimate the hypothesis which approximately maximizes this objective function. Expand
Discriminative n-gram language modeling
TLDR
This paper describes a method based on regularized likelihood that makes use of the feature set given by the perceptron algorithm, and initialization with the perceptRON's weights; this method gives an additional 0.5% reduction in word error rate (WER) over training withThe perceptron alone. Expand
Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition
TLDR
A novel neural network language model structure, the succeeding-word RNNLM, su-RNNLM is proposed, which is more efficient in training than bi-directional models and can be applied to lattice rescoring. Expand
Joint reranking of parsing and word recognition with automatic segmentation
TLDR
The results indicate that the parse language model alone provides little benefit over a large n-gram model, but adding non-local syntactic features leads to improved performance, and including alternative word-sequence hypotheses has a much greater impact on parse accuracy. Expand
Extending boosting for call classification using word confusion networks
TLDR
A novel algorithm for exploiting ASR word confidence scores for better classification of spoken utterances and methods for on-line and off-line score combinations are presented. Expand
The effect of pruning and compression on graphical representations of the output of a speech recognizer
TLDR
A word graph compression algorithm is introduced that significantly reduces the number of words in the graphical representation without eliminating utterance hypotheses or distorting their acoustic scores. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Finding consensus among words: lattice-based word error minimization
TLDR
A new algorithm for finding the hypothesis in a recognition lattice that is expected to minimize the word error rate (WER) is described, which overcomes the mismatch between the word-based performance metric and the standard MAP scoring paradigm that is sentence-based. Expand
Explicit word error minimization in n-best list rescoring
TLDR
A new algorithm is developed that explicitly minimizes expected word error for recognition hypotheses, and approximate the posterior hypothesis probabilities using N-best lists and chooses the hypothesis with the lowest error. Expand
Estimating confidence using word lattices
TLDR
In experiments on spontaneous human-to-human speech data the use of word lattice related information signi cantly improves the tagging accuracy and this work exploits theUse of such word lattices as information sources for the measure-of-con dence tagger JANKA. Expand
Posterior probability decoding, confidence estimation and system combination
TLDR
The word lattices produced by the Viterbi decoder were used to generate confusion networks, which provide a compact representation of the most likely word hypotheses and their associated word posterior probabilities. Expand
Large vocabulary decoding and confidence estimation using word posterior probabilities
  • Gunnar Evermann, P. Woodland
  • Computer Science
  • 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
  • 2000
The paper investigates the estimation of word posterior probabilities based on word lattices and presents applications of these posteriors in a large vocabulary speech recognition system. A novelExpand
Lattice Compression in the Consensual Post-Processing Framework
TLDR
This paper shows how the outcome of this method for identifying mutually supporting and competing word hypotheses in a recognition lattice can be used for compressing lattices and results in better compression results than the conventionally used technique. Expand
Minimum Bayes-risk automatic speech recognition
TLDR
This paper provides experimental results showing that both the A* and N -best list rescoring implementations of minimum-risk classifiers yield better recognition accuracy than the commonly used maximum a posteriori probability (MAP) classifier in word transcription and identification of keywords. Expand
A comparison of word graph and n-best list based confidence measures
TLDR
It is shown that posterior word probabilities computed on word graphs and N-best lists clearly outperform non-probabilistic confidence measures, e.g. the acoustic stability and the hypothesis density. Expand
THE SRI MARCH 2000 HUB-5 CONVERSATIONAL SPEECH TRANSCRIPTION SYSTEM
TLDR
SRI’s large vocabulary conversational speech r ecognition system as used in the March 2000 NIST Hub-5E evaluation is described and a generalized ROVER algorithm is applied to combine the N-best hypotheses from several systems based on different acoustic models. Expand
LVCSR log-likelihood ratio scoring for keyword spotting
  • M. Weintraub
  • Computer Science
  • 1995 International Conference on Acoustics, Speech, and Signal Processing
  • 1995
A new scoring algorithm has been developed for generating wordspotting hypotheses and their associated scores. This technique uses a large-vocabulary continuous speech recognition (LVCSR) system toExpand
...
1
2
3
4
...