Deep clustering: Discriminative embeddings for segmentation and separation
We previously introduced a framework called time-domain spectrogram factorization (TSF), which realizes nonnegative matrix factorization (NMF)-like source separation in the time domain. This framework is particularly noteworthy in that, while maintaining the ability of NMF to obtain a parts-based representation of magnitude spectra, it allows us to (i) circumvent the commonly made assumption with the NMF approach that the magnitude spectra of source components are additive and (ii) take account of the interdependence of the phase/amplitude components at different time-frequency points. In particular, the second factor has been overlooked despite its potential importance. Our previous study revealed that the conventional TSF algorithm was relatively slow due to large matrix inversions, and the early stopping of the algorithm often resulted in poor separation accuracy. To overcome this problem, this paper presents an iterative TSF solver using projected gradient updates. Simulation results show that the proposed TSF approach yields higher source separation performance than NMF and the other variants including the original TSF.