Sparse Overcomplete Decomposition for Single Channel Speaker Separation


We present an algorithm for separating multiple speakers from a mixed single channel recording. The algorithm is based on a model proposed by Raj and Smaragdis (2005). The idea is to extract certain characteristic spectra-temporal basis functions from training data for individual speakers and decompose the mixed signals as linear combinations of these learned bases. In other words, their model extracts a compact code of basis functions that can explain the space spanned by spectral vectors of a speaker. In our model, we generate a sparse-distributed code where we have more basis functions than the dimensionality of the space. We propose a probabilistic framework to achieve sparsity. Experiments show that the resulting sparse code better captures the structure in data and hence leads to better separation.

DOI: 10.1109/ICASSP.2007.366317

Extracted Key Phrases

4 Figures and Tables

Cite this paper

@article{Shashanka2007SparseOD, title={Sparse Overcomplete Decomposition for Single Channel Speaker Separation}, author={Madhusudana V. S. Shashanka and Bhiksha Raj and Paris Smaragdis}, journal={2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07}, year={2007}, volume={2}, pages={II-641-II-644} }