Combined Estimation of Spectral Envelopes and Sound Source Direction of Concurrent Voices by Multidimensional Statistical Filtering

Abstract

A key question for speech enhancement and simulations of auditory scene analysis in high levels of nonstationary noise is how to combine principles of auditory grouping and to integrate several noise-perturbed acoustical cues in a robust way. We present an application of recent online, nonlinear, non-Gaussian multidimensional statistical filtering methods which integrates tracking of sound-source direction and spectro-temporal dynamics of two mixed voices. The framework used is in agreement with the notion of evaluating competing hypotheses. To limit the number of hypotheses which need to be evaluated, the approach developed here uses a detailed statistical description of the high-dimensional spectro-temporal dynamics of speech, which is measured from a large speech database. The results show that the algorithm tracks sound source directions very precisely, separates the voice envelopes with algorithmic convergence times down to 50 ms, and enhances the signal-to-noise ratio in adverse conditions, requiring high computational effort. The approach has a high potential for improvements of efficiency and could be applied for voice separation and reduction of nonstationary noises

DOI: 10.1109/TASL.2006.889788

7 Figures and Tables

Cite this paper

@article{Nix2007CombinedEO, title={Combined Estimation of Spectral Envelopes and Sound Source Direction of Concurrent Voices by Multidimensional Statistical Filtering}, author={Johannes Nix and Volker Hohmann}, journal={IEEE Transactions on Audio, Speech, and Language Processing}, year={2007}, volume={15}, pages={995-1008} }