Timo Gerkmann

Learn More
Recently, it has been proposed to estimate the noise power spectral density by means of minimum mean-square error (MMSE) optimal estimation. We show that the resulting estimator can be interpreted as a voice activity detector (VAD)-based noise power estimator, where the noise power is updated only when speech absence is signaled, compensated with a required(More)
In this paper, we analyze the minimum mean square error (MMSE) based spectral noise power estimator [1] and present an improvement. We will show that the MMSE based spectral noise power estimate is only updated when the a posteriori signal-to-noise ratio (SNR) is lower than one. This threshold on the a posteriori SNR can be interpreted as a voice activity(More)
In this contribution we present an improved estimator for the speech presence probability at each time-frequency point in the short-time Fourier-transform domain. In contrast to existing approaches this estimator does not rely on an adaptively estimated and thus signal dependent a priori signal-to-noise ratio estimate. It therefore decouples the estimation(More)
The enhancement of speech which is corrupted by noise is commonly performed in the short-time discrete Fourier transform domain. In case only a single microphone signal is available, typically only the spectral amplitude is modified. However, it has recently been shown that an improved spectral phase can as well be utilized for speech enhancement, e.g., for(More)
While state-of-the-art approaches obtain an estimate of the a priori SNR by adaptively smoothing its maximum likelihood estimate in the frequency domain, we selectively smooth the maximum likelihood estimate in the cepstral domain. In the cepstral domain the noisy speech signal is decomposed into coefficients related mainly to the speech envelope, the(More)
Several binaural audio signal enhancement algorithms were evaluated with respect to their potential to improve speech intelligibility in noise for users of bilateral cochlear implants (CIs). 50% speech reception thresholds (SRT50) were assessed using an adaptive procedure in three distinct, realistic noise scenarios. All scenarios were highly nonstationary,(More)
In this paper, we derive the signal power bias that arises when spectral amplitudes are smoothed by reducing their variance in the cepstral domain (often referred to as <i>cepstral smoothing</i>) and develop a power bias compensation method. We show that if chi-distributed spectral amplitudes are smoothed in the cepstral domain, the resulting smoothed(More)
The quality of speech signals recorded in an enclosure can be severely degraded by room reverberation. In this paper, we focus on a class of blind batch methods for speech dereverberation in a noiseless scenario with a single source, which are based on multi-channel linear prediction in the short-time Fourier transform domain. Dereverberation is performed(More)