Sound source separation based on non-negative tensor factorization incorporating spatial cue as prior knowledge

Abstract

This paper concerns a new method of source separation that uses a spatial cue given by a user or from accompanying images to extract a target sound. The algorithm is based on non-negative tensor factorization (NTF), which decomposes multichannel spectrograms into three matrices. The components of one of the three matrices represent spatial information and are associated with the spatial cue, thus indicating which bins of the spectrogram should be given preference. When a spatial cue is available, this method has a great advantage over conventional PARAFAC-NTF in terms of both computational costs and separation quality, as measured by evaluation metrics such as SDR, SIR and SAR.

DOI: 10.1109/ICASSP.2013.6637611

Extracted Key Phrases

3 Figures and Tables

Cite this paper

@article{Mitsufuji2013SoundSS, title={Sound source separation based on non-negative tensor factorization incorporating spatial cue as prior knowledge}, author={Yuki Mitsufuji and Axel R{\"{o}bel}, journal={2013 IEEE International Conference on Acoustics, Speech and Signal Processing}, year={2013}, pages={71-75} }