Hywel B. Richards

Learn More
Conversational speech recognition is a challenging problem primarily because speakers rarely fully articulate sounds. A successful speech recognition approach must infer intended spectral targets from the speech data, or develop a method of dealing with large variances in the data. Hidden Dynamic Models (HDMs) attempt to automatically learn such targets in(More)
This paper introduces a new approach to acoustic-phonetic modelling , the Hidden Dynamic Model (HDM), which explicitly accounts for the coarticulation and transitions between neighbouring phones. Inspired by the fact that speech is really produced by an underlying dynamic system, the HDM consists of a single vector target per phone in a hidden dynamic space(More)
This paper describes recent changes in Dragon's speech recognition system which have markedly improved performance on conversational telephone speech. Key changes include: the conversion to modified PLP-based cepstra from mel-cepstra; the replacement of our usual IMELDA transformation by a new transform using " semi-tied covariance " ; a new multi-pass(More)
Compared with the usual acoustic representations, articulatory models ooer potential beneets in giving a compact, slowly changing representation, having a closer relationship with the phonetic domain and allowing a straightforward treatment of coarticulation and transitional eeects. A long-standing diiculty preventing the realisation of these beneets is the(More)
A new approach is described which estimates vocal tract shape sequences for speech consisting of voiceless speech and periods of silence as well as voiced speech. This method, based on the use of articulatory codebooks, has proved successful in identifying the place position of stops and fricatives. Secondly, we focus on voiced speech in particular. A fast(More)
The objective of this work is a computationally ecient method for inferring vocal tract shape trajectories from acoustic speech signals. We use an MLP to model the vocal tract shape-to-acoustics mapping, then in an analysis-by-synthesis approach, optimise an objective function that includes both the accuracy of the spectrum approximation and the credibility(More)
This paper describes a cross-validation method to determine the appropriate weight with which dynamic constraints should be applied when estimating vocal tract shapes from speech. This data-dependent method can estimate the weighting without the need for separate prior knowledge of the source and noise statistics. The principles are first demonstrated on a(More)
  • 1