• Publications
  • Influence
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
TLDR
An important feature of the method is that arbitrary adaptation data can be used—no special enrolment sentences are needed and that as more data is used the adaptation performance improves.
The HTK book
TLDR
The Fundamentals of HTK: General Principles of HMMs, Recognition and Viterbi Decoding, and Continuous Speech Recognition.
The HTK book version 3.4
Minimum Phone Error and I-smoothing for improved discriminative training
TLDR
The Minimum Phone Error (MPE) and Minimum Word Error (MWE) criteria are smoothed approximations to the phone or word error rate respectively and I-smoothing which is a novel technique for smoothing discriminative training criteria using statistics for maximum likelihood estimation (MLE).
Mean and variance adaptation within the MLLR framework
TLDR
This paper examines the maximum likelihood linear regression (MLLR) adaptation technique, which has been applied to the mean parameters in mixture-Gaussian HMM systems and is extended to also update the Gaussian variances and re-estimation formulae are derived for these variance transforms.
Tree-based state tying for high accuracy acoustic modelling
TLDR
This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree, which is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones.
Large scale discriminative training of hidden Markov models for speech recognition
TLDR
It is shown that HMMs trained with MMIE benefit as much as MLE-trained HMMs from applying model adaptation using maximum likelihood linear regression (MLLR), which has allowed the straightforward integration of MMIe- trained HMMs into complex multi-pass systems for transcription of conversational telephone speech.
Posterior probability decoding, confidence estimation and system combination
TLDR
The word lattices produced by the Viterbi decoder were used to generate confusion networks, which provide a compact representation of the most likely word hypotheses and their associated word posterior probabilities.
Flexible speaker adaptation using maximum likelihood linear regression
Substituted acetylenes wherein the amino and acetylene groups are directly connected to aromatic ring carbon atoms and wherein the substituted acetylene has at least 3 carbon atoms and a hydroxyl
...
...