Catalog-based single-channel speech-music separation with the Itakura-Saito divergence

  title={Catalog-based single-channel speech-music separation with the Itakura-Saito divergence},
  author={Cemil Demir and Ali Taylan Cemgil and Murat Saraçlar},
  journal={2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)},
  • Cemil Demir, A. Cemgil, M. Saraçlar
  • Published 18 October 2012
  • Computer Science, Mathematics
  • 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)
In this study, we introduce a catalog-based single-channel speech-music separation method with the Itakura-Saito (IS) divergence measure. Previously, we have developed the catalog-based separation method with the Kullback-Leibler (KL) divergence. In the probabilistic point of view, IS divergence corresponds to a complex Gaussian observation model. Comparison of divergence measures or observation models in speech-music separation task is carried out with both of catalog-based and traditional Non… 
Incorporating Prior Information in Nonnegative Matrix Factorization for Audio Source Separation
This thesis improves the performance of NMF for source separation by incorporating more constraints and prior information related to the source signals to the NMF decomposition results, and improves theNMF training for the basis models.


Catalog-based single-channel speech-music separation for automatic speech recognition
This paper proposes to recover the speech signals from the mixed signal in time-domain by detecting the active catalog frames using the catalog-based method and compares the performances of 3 different signal reconstruction techniques; Expectation-Based, Posterior-Based and Time-Domain reconstruction.
Gain estimation approaches in catalog-based single-channel speech-music separation
This paper addresses the gain estimation problem of the catalog-based single-channel speech-music separation method and proposes three different approaches to overcome this problem, with Gamma Markov Chain approach achieving the best performance.
Semi-Supervised Single-Channel Speech-Music Separation for Automatic Speech Recognition
A semi-supervised speech-music separation method which uses the speech, music and speech- Music segments in a given segmented audio signal to separate speech and music signals from each other in the mixed speech- music segments is proposed.
Evaluation of several strategies for single sensor speech/music separation
This paper proposes a new system that employs separate models for the speech and music signals, and demonstrates the improved performance of the proposed approach for speech/music separation in some evaluation criteria.
Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis
Results indicate that IS-NMF correctly captures the semantics of audio and is better suited to the representation of music signals than NMF with the usual Euclidean and KL costs.
Non-negative matrix factorization based compensation of music for automatic speech recognition
Non-negative matrix factorization based speech enhancement in robust automatic recognition of mixtures of speech and music is proposed and shown to produce a consistent, significant improvement on the recognition performance in the comparison with the baseline method.
Single-channel speech separation using sparse non-negative matrix factorization
It is shown that computational savings can be achieved by segmenting the training data on a phoneme level and the performance of the unsupervised and supervised adaptation schemes result in significant improvements in terms of the target-to-masker ratio.
Single-Channel Speech Separation usin
We apply machine learning techniques to the problem of separating multiple speech sources from a single microphone recording. The method of choice is a sparse non-negative matrix factorization
Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria
  • Tuomas Virtanen
  • Mathematics, Computer Science
    IEEE Transactions on Audio, Speech, and Language Processing
  • 2007
An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented and enables a better separation quality than the previous algorithms.
The effects of background music on speech recognition accuracy
  • B. Raj, V. Parikh, R. Stern
  • Computer Science
    1997 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 1997
The effects of different kinds of music on automatic speech recognition systems are examined by comparing the effects of music with the relatively well-known effects of white noise by examining the extent to which compensation algorithms that have been successfully applied to noisy speech are helpful in improving recognition accuracy for speech that is corrupted by music.