Binaural Detection, Localization, and Segregation in Reverberant Environments Based on Joint Pitch and Azimuth Cues

Abstract

We propose an approach to binaural detection, localization and segregation of speech based on pitch and azimuth cues. We formulate the problem as a search through a multisource state space across time, where each multisource state encodes the number of active sources, and the azimuth and pitch of each active source. A set of multilayer perceptrons are trained to assign time-frequency units to one of the active sources in each multisource state based jointly on observed pitch and azimuth cues. We develop a novel hidden Markov model framework to estimate the most probable path through the multisource state space. An estimated state path encodes a solution to the detection, localization, pitch estimation and simultaneous organization problems. Segregation is then achieved with an azimuth-based sequential organization stage. We demonstrate that the proposed framework improves segregation relative to several two-microphone comparison systems that are based solely on azimuth cues. Performance gains are consistent across a variety of reverberant conditions.

DOI: 10.1109/TASL.2012.2236316

Extracted Key Phrases

6 Figures and Tables

01020201520162017
Citations per Year

Citation Velocity: 6

Averaging 6 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Cite this paper

@article{Woodruff2013BinauralDL, title={Binaural Detection, Localization, and Segregation in Reverberant Environments Based on Joint Pitch and Azimuth Cues}, author={John Woodruff and DeLiang Wang}, journal={IEEE Transactions on Audio, Speech, and Language Processing}, year={2013}, volume={21}, pages={806-815} }