Transductive convolutive nonnegative matrix factorization for speech separation
Nonnegative matrix factorization (NMF) is an effective speech separation approach of extracting discriminative components of different speaker. However, traditional NMF focuses only on the additive combination of the components and ignores the dependencies of speeches. Convolutive NMF (CNMF) captures the dependencies of speeches by overlapping components and achieves better separation performance. NMF and CNMF learn dictionaries for speakers in the absence of mixture, and thus they are unable to get enough information to learn dictionaries accurately when testing speeches are available. To handle this problem, transductive NMF (TNMF) is proposed which simultaneously utilizes speech of each speaker and mixture to learn more meaningful features of speakers, and significantly boost speech separation. CNMF addresses the dependencies of speech signals while it ignores the positive effect of mixtures in learning dictionaries. TNMF emphasizes the transductive learning of dictionaries while it fails to consider dependencies of speeches. This paper proposes transductive convolutive NMF (TCNMF) to overcome the deficiencies of both CNMF and TNMF. Experimental results show that our method makes significant improvement compared to aforementioned NMF-based methods.