Audio Source Separation in Reverberant Environments Using $\beta$-Divergence-Based Nonnegative Factorization
This paper concerns a new method of source separation that uses a spatial cue given by a user or from accompanying images to extract a target sound. The algorithm is based on non-negative tensor factorization (NTF), which decomposes multichannel spectrograms into three matrices. The components of one of the three matrices represent spatial information and are associated with the spatial cue, thus indicating which bins of the spectrogram should be given preference. When a spatial cue is available, this method has a great advantage over conventional PARAFAC-NTF in terms of both computational costs and separation quality, as measured by evaluation metrics such as SDR, SIR and SAR.