Learn More
In this paper we present an analytic derivation of the moments of the phase factor between clean speech and noise cepstral or log-mel-spectral feature vectors. The development shows, among others, that the probability density of the phase factor is of sub-Gaussian nature and that it is independent of the noise type and the signal-to-noise ratio, however(More)
In this contribution we extend a previously proposed Bayesian approach for the enhancement of reverberant logarithmic mel power spectral coefficients for robust automatic speech recognition to the additional compensation of background noise. A recently proposed observation model is employed whose time-variant observation error statistics are obtained as a(More)
The Amigo Context Management Service (CMS) provides an open infrastructure for the exchange of contextual information between context sources and context clients. Whereas context sources supply context information, retrieved from sensors or services within the networked home environment, context clients utilize those information to become context-aware. An(More)
In this paper we present a system for identifying and localizing speakers using distant microphone arrays and a steerable pan-tilt-zoom camera. The scenario at hand assumes audio streams to be processed in real-time to get the diarization information " who spokes when and where " with only short delays. Our new idea is to fuse the acoustical and visual(More)
The accuracy of automatic speech recognition systems in noisy and reverberant environments can be improved notably by exploiting the uncertainty of the estimated speech features using so-called uncertainty-of-observation techniques. In this paper, we introduce a new Bayesian decision rule that can serve as a mathematical framework from which both known and(More)
In this contribution we investigate the effectiveness of BAYESIAN feature enhancement (BFE) on a medium-sized recognition task containing real-world recordings of noisy re-verberant speech. BFE employs a very coarse model of the acoustic impulse response (AIR) from the source to the microphone , which has been shown to be effective if the speech to be(More)
In this contribution we present a theoretical and experimental investigation into the effects of reverberation and noise on features in the logarithmic mel power spectral domain, an intermediate stage in the computation of the mel frequency cepstral coefficients, prevalent in automatic speech recognition (ASR). Gaining insight into the complex interaction(More)
In this work, a novel approach for the initialization of switching linear dynamic models (SLDMs) as dynamic models for the trajectory of speech features is proposed. Borrowing ideas from the " k-means++ "-algorithm, the goal of this approach is to find distinctly different SLDMs, modelling the complex dynamics of the speech features, already at the(More)
In this work, a splitting and weighting scheme that allows for splitting a Gaussian density into a Gaussian mixture density (GMM) is extended to allow the mixture components to be arranged along arbitrary directions. The parameters of the Gaus-sian mixture are chosen such that the GMM and the original Gaussian still exhibit equal central moments up to an(More)