Michael Kleinschmidt

Learn More
Psychoacoustical and neurophysiological results indicate that spectro-temporal modulations play an important role in sound perception. Speech signals, in particular, exhibit distinct spectro-temporal patterns which are well matched by receptive fields of cortical neurons. In order to improve the performance of automatic speech recognition (ASR) systems a(More)
Recent results from physiological and psychoacoustic studies indicate that spectrally and temporally localized time-frequency envelope patterns form a relevant basis of auditory perception. This motivates new approaches to feature extraction for automatic speech recognition (ASR) which utilize two-dimensional spectro-temporal modulation filters. The paper(More)
A novel type of feature extraction for automatic speech recognition is investigated. Two-dimensional Gabor functions, with varying extents and tuned to different rates and directions of spectro-temporal modulation, are applied as filters to a spectro-temporal representation provided by mel spectra. The use of these functions is motivated by findings in(More)
A main task for computational auditory scene analysis (CASA) is to separate several concurrent speech sources. From psychoa-coustics it is known that common onsets, common amplitude modulation and sound source direction are among the important cues which allow the separation for the human auditory system. A new algorithm is presented here, that performs(More)
A novel noise suppression scheme for speech signals is proposed which is based on a neurophysiologically-motivated estimation of the local signal-to-noise ratio (SNR) in different frequency channels. For SNR-estimation, the input signal is transformed into so-called Amplitude Modulation Spectrograms (AMS), which represent both spectral and temporal(More)
In this paper a new approach is presented for estimating the long-term speech-to-noise ratio (SNR) in individual frequency bands that is based on methods known from automatic speech recognition (ASR). It uses a model of auditory perception as front end, physiologically and psychoacoustically motivated sigma-pi cells as secondary features, and a linear or(More)
In order to enhance automatic speech recognition performance in adverse conditions, localized spectro-temporal features (LSTF) are investigated, which are motivated by physiological measurements in the primary auditory cortex. In the Aurora2 experimental setup, Gabor-shaped LSTFs combined with a Tandem system yield robust performance with a feature set size(More)
  • 1