Learn More
In this paper, we develop di€erent mathematical models in the framework of the multi-stream paradigm for noise robust automatic speech recognition (ASR), and discuss their close relationship with human speech perception. Largely inspired by Fletcher's ``product-of-errors'' rule (PoE rule) in psychoacoustics, multi-band ASR aims for robustness to data(More)
Feature projection by non-linear discriminant analysis (NLDA) can substantially increase classification performance. In automatic speech recognition (ASR) the projection provided by the pre-squashed outputs from a one hidden layer multi-layer perceptron (MLP) trained to recognise speech sub-units (phonemes) has previously been shown to significantly(More)
Traditional microphone array speech recognition systems simply recognise the enhanced output of the array. As the level of signal enhancement depends on the number of microphones, such systems do not achieve acceptable speech recognition performance for arrays having only a few microphones. For small microphone arrays, we instead propose using the enhanced(More)
In the " missing data " (MD) approach to noise robust automatic speech recognition (ASR), speech models are trained on clean data, and during recognition sections of spectral data dominated by noise are detected and treated as " missing ". However, this all-or-nothing hard decision about which data is missing does not accurately reflect the probabilistic(More)
Sequence recognition performance is often summarised first in terms of the number of hits (H), substitutions (S), deletions (D) and insertions (I), and then as a single statistic by the " word error rate " WER = 100(S+D+I)/(H+S+D). While in common use, WER has two disadvantages as a performance measure. One is that it has no upper bound, so it doesn't tell(More)
The performance of most ASR systems degrades rapidly with data mismatch relative to the data used in training. Under many realistic noise conditions a significant proportion of the spectral representation of a speech signal, which is highly redundant, remains uncorrupted. In the " missing feature " approach to this problem mismatching data is simply(More)
Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording. Automatic speech recognition techniques, e.g. Hidden Markov Models (HMM), can be optimised for maximum segmentation accuracy. This paper presents the results of tuning such a phoneme segmentation system. Firstly, using no text transcription, the(More)