Bayes risk decoding and its application to system combination


Speech recognition is the task of converting an acoustic signal, which contains speech, to written text. The error of a speech recognition system is measured in the number of words in which the recognized and the spoken text differ. This work investigates and develops decoding and system combination approaches within the Bayes risk decoding framework with the objective of reducing the number of word errors. The investigated approaches are computationally too expensive to be applied in the speech decoder. Instead, the result of a first recognition run is used which narrows the number of hypotheses and provides the result in a compact form, the word lattice. In the single system decoding task a single word lattice is given and in the lattice-based system combination task a word lattice is provided by each system. In both cases the goal is to minimize the number of word errors in the ultimate hypothesis. In large vocabulary continuous speech recognition (LVCSR) tasks the number of word errors is computed as the Levenshtein distance between recognized and spoken text. The Bayes risk decoding framework yields the hypothesis with the least expected number of errors w.r.t. a specified loss function and given the true sentence posterior probabilities. However, neither the true probabilities are known nor is the computation of the Bayes risk hypothesis with the Levenshtein distance as loss function computationally feasible for a word lattice. Consequently, in lattice-based Bayes risk decoding and system combination two problems have to be addressed: first, how to compute an estimate for the sentence posterior probabilities given one or several word lattices; second, how to approximate the Levenshtein distance such that the computation of the Bayes risk hypothesis becomes computationally feasible. Based on the separation of the posterior probability computation and the loss function in the Bayes risk decoding rule a framework will be developed, which covers the common approaches to lattice-based system combination, like ROVER, CNC, and DMC. Furthermore, it will be shown that the common approximations of the Levenshtein distance used in LVCSR tasks can be classified into two categories for which efficient Bayes risk decoder exist. The existing approximates will be investigated and compared. New loss functions will be developed which overcome drawbacks of the existing approximations to the Levenshtein distance, like the frequently observed deletion bias. A data structure of particular interest is the confusion network (CN). In previous work it was shown that a CN has a simple decoding rule in the Bayes risk framework. In this work new algorithms for deriving a CN from a word lattice will be developed and compared to existing methods. Furthermore, the CN will be the base for several investigations aiming at improving the posterior probability estimates and the approximation of the Levenshtein distance. The methods looked into include classifier-based system combination and the usage of a windowed Levenshtein distance as loss function for the Bayes risk decoder. A further topic of research is the log-linear model combination for which the enhancement with modeland word-dependent scaling factors will be investigated. The methods are tested on the Chinese speech recognition systems used by RWTH Aachen in the GALE project and on the lattices provided within the English track of the 2007 TC-Star EPPS evaluation. The best performing system combination methods investigated in this work improve the error rates by up to 10% relative for intra-site combination experiments and by more than 20% relative for cross-site combinations compared to the best single system. The newly developed methods show a slight improvement over the existing approaches to lattice decoding and lattice-based system combination.

63 Figures and Tables

Cite this paper

@inproceedings{Hoffmeister2011BayesRD, title={Bayes risk decoding and its application to system combination}, author={Bj{\"{o}rn Hoffmeister}, year={2011} }