Combined Estimation of Spectral Envelopes and Sound Source Direction of Concurrent Voices by Multidimensional Statistical Filtering


A key question for speech enhancement and simulations of auditory scene analysis in high levels of nonstationary noise is how to combine principles of auditory grouping and to integrate several noise-perturbed acoustical cues in a robust way. We present an application of recent online, nonlinear, non-Gaussian multidimensional statistical filtering methods which integrates tracking of sound-source direction and spectro-temporal dynamics of two mixed voices. The framework used is in agreement with the notion of evaluating competing hypotheses. To limit the number of hypotheses which need to be evaluated, the approach developed here uses a detailed statistical description of the high-dimensional spectro-temporal dynamics of speech, which is measured from a large speech database. The results show that the algorithm tracks sound source directions very precisely, separates the voice envelopes with algorithmic convergence times down to 50 ms, and enhances the signal-to-noise ratio in adverse conditions, requiring high computational effort. The approach has a high potential for improvements of efficiency and could be applied for voice separation and reduction of nonstationary noises

DOI: 10.1109/TASL.2006.889788

8 Figures and Tables

Showing 1-10 of 68 references

An introduction to sequential Monte Carlo methods, " in Sequential Monte Carlo Methods in Practice , ser. Statistics for engineering and information science

  • A Doucet, N De Freitas, N Gordon
  • 2001
Highly Influential
4 Excerpts

An Introduction to the Psychology of Hearing

  • B C J Moore
  • 1989
Highly Influential
3 Excerpts

A Bayesian approach to blind source recovery

  • M Daly, J P Reilly, J Manton
  • 2004
1 Excerpt