Autocorrelation-based Methods for Noise- Robust Speech Recognition

Abstract

One major concern in the design of speech recognition systems is their performance in real environments. In such conditions, different sources could exist which may interfere with the speech signal. The effects of such sources could generally be classified as additive noise and channel distortion. As the names imply, noise is usually considered as additive in spectral domain while channel distortion is multiplicative and therefore appears as an additive part in logarithmic spectrum. These could both result in severe performance degradations in automatic speech recognition (ASR) systems. Thus, in recent years, a substantial amount of research has been devoted to improving the performance of Automatic Speech Recognition (ASR) Systems in such environments. The main approaches taken to improve the performance of ASR systems could be roughly divided into three main categories, namely, robust speech feature extraction; speech enhancement and model-based compensation for noise. The main goal of the robust speech feature extraction techniques is to find a set of parameters, to represent speech signal in the ASR system, that are robust against the variations in the speech signal due to noise or channel distortions. Extensive research has resulted in such well-known techniques as RASTA filtering (Hermansky & Morgan, 1994), cepstral mean normalization (CMN) (Kermorvant, 1999), use of dynamic spectral features (Furui, 1986), short-time modified coherence (SMC) (Mansour & Juang, 1989a) and also one-sided autocorrelation LPC (OSALPC) (Hernando & Nadeu, 1997), differential power spectrum (DPS) (Chen et al., 2003) and relative autocorrelation sequence (RAS) (Yuo & Wang, 1998, 1999). In the case of speech enhancement, some initial information about speech and noise is needed to allow the estimation of noise and clean up of the noisy speech. Widely used methods in this category include spectral subtraction (SS) (Beh & Ko, 2003; Boll, 1979) and Wiener filtering (Lee et al., 1996). In the framework of model-based compensation, statistical models such as Hidden Markov Models (HMMs) are usually considered. The compensation techniques try to remove the mismatch between the trained models and the noisy speech to improve the performance of ASR systems. Methods such as parallel model combination (PMC) (Gales & Young, 1995, 1996), vector Taylor series (VTS) (Acero et al., 2000; Kim et al., 1998; Moreno, 1996; Moreno et al.,

Cite this paper

@inproceedings{Farahani2012AutocorrelationbasedMF, title={Autocorrelation-based Methods for Noise- Robust Speech Recognition}, author={Gholamreza Farahani and Mohammad Saeed Ahadi and Mohammad Mehdi Homayounpour}, year={2012} }