Detecting multiple, simultaneous talkers through localising speech recorded by ad-hoc microphone arrays
This paper addresses a new statistical model of binaural signals and its application to efficient binaural source separation. Binaural source separation is always required to retain a spatial cue of the separated sound, such as a head-related transfer function (HRTF). However, the direct use of an HRTF is not realistic because this information is normally not known in advance. To cope with this problem, first, we focus on the difference between signal probability density functions at both ears, which can be blindly estimated by using our previous work on higher-order statistics. Next, we derive a sound-localization-preserved generalized minimum mean-square error short-time spectral amplitude estimator. Objective and subjective experiments show the efficacy of the proposed method in terms of spatial quality.