Learn More
In recent years, substantial progress has been made in the field of reverberant speech signal processing, including both singleand multichannel dereverberation techniques and automatic speech recognition (ASR) techniques that are robust to reverberation. In this paper, we describe the REVERB challenge, which is an evaluation campaign that was designed to(More)
In this paper we present an analytic derivation of the moments of the phase factor between clean speech and noise cepstral or log-mel-spectral feature vectors. The development shows, among others, that the probability density of the phase factor is of sub-Gaussian nature and that it is independent of the noise type and the signal-to-noise ratio, however(More)
In this contribution we extend a previously proposed Bayesian approach for the enhancement of reverberant logarithmic mel power spectral coefficients for robust automatic speech recognition to the additional compensation of background noise. A recently proposed observation model is employed whose time-variant observation error statistics are obtained as a(More)
The accuracy of automatic speech recognition systems in noisy and reverberant environments can be improved notably by exploiting the uncertainty of the estimated speech features using so-called uncertainty-of-observation techniques. In this paper, we introduce a new Bayesian decision rule that can serve as a mathematical framework from which both known and(More)
The Amigo Context Management Service (CMS) provides an open infrastructure for the exchange of contextual information between context sources and context clients. Whereas context sources supply context information, retrieved from sensors or services within the networked home environment, context clients utilize those information to become context-aware. An(More)
In this paper we present a system for identifying and localizing speakers using distant microphone arrays and a steerable pan-tilt-zoom camera. The scenario at hand assumes audio streams to be processed in real-time to get the diarization information " who spokes when and where " with only short delays. Our new idea is to fuse the acoustical and visual(More)
In this contribution we investigate the effectiveness of BAYESIAN feature enhancement (BFE) on a medium-sized recognition task containing real-world recordings of noisy re-verberant speech. BFE employs a very coarse model of the acoustic impulse response (AIR) from the source to the microphone , which has been shown to be effective if the speech to be(More)
In this contribution we present a theoretical and experimental investigation into the effects of reverberation and noise on features in the logarithmic mel power spectral domain, an intermediate stage in the computation of the mel frequency cepstral coefficients, prevalent in automatic speech recognition (ASR). Gaining insight into the complex interaction(More)
In this work, a novel approach for the initialization of switching linear dynamic models (SLDMs) as dynamic models for the trajectory of speech features is proposed. Borrowing ideas from the " k-means++ "-algorithm, the goal of this approach is to find distinctly different SLDMs, modelling the complex dynamics of the speech features, already at the(More)