Single-channel audio source separation with NMF: divergences, constraints and algorithms

@inproceedings{Fvotte2018SinglechannelAS,
  title={Single-channel audio source separation with NMF: divergences, constraints and algorithms},
  author={C{\'e}dric F{\'e}votte and Emmanuel Vincent and Alexey Ozerov},
  year={2018}
}
Spectral decomposition by nonnegative matrix factorisation (NMF) has become state-of-the-art practice in many audio signal processing tasks, such as source separation, enhancement or transcription. This chapter reviews the fundamentals of NMF-based audio decomposition, in unsupervised and informed settings. We formulate NMF as an optimisation problem and discuss the choice of the measure of fit. We present the standard majorisation-minimisation strategy to address optimisation for NMF with the… 

Unsupervised Audio Source Separation using Generative Priors

This work proposes a novel approach for audio source separation based on generative priors trained on individual sources that simultaneously searches in the source-specific latent spaces to effectively recover the constituent sources through the use of projected gradient descent optimization.

Phase Retrieval With Bregman Divergences and Application to Audio Signal Recovery

Phase retrieval (PR) aims to recover a signal from the magnitudes of a set of inner products. This problem arises in many audio signal processing applications which operate on a short-time Fourier

A Wavenet for Music Source Separation

The experimental results show that it is possible to approach the problem of music source separation in a end-to-end learning fashion, since the model proposed performs on par with DeepConvSep – a state-of-the-art method based on processing magnitude spectrograms.

Musical Instrument Separation on Shift-Invariant Spectrograms via Stochastic Dictionary Learning

A time-frequency representation that is both shift-invariant and frequency-aligned, with a variant that can also be used for wideband signals is developed, and the reasonability of the representation is ensured by a sparsity condition.

Adaptive Autoregressive Pre-whitening for Speech and Audio Signals through Parametric NMF

The proposed pre-whitener in combination with parametric methods, such as a recently introduced Bayesian pitch tracker, improves the estimation accuracy of a time of arrival (TOA) estimation method in a scenario in which the noise is colored.

Adaptive Pre-whitening Based on Parametric NMF

An adaptive pre-whitener based on a supervised non-negative matrix factorization (NMF), in which a pre-trained dictionary includes parametrized spectral information about the noise and speech sources in the form of autoregressive (AR) coefficients, shows that the noise can get closer to white, in comparison to pre-Whiteners based on conventional noise power spectral density (PSD) estimates.

NMF-weighted SRP for multi-speaker direction of arrival estimation: robustness to spatial aliasing while exploiting sparsity in the atom-time domain

An approach that decomposes a signal spectrum into a weighted sum of broadband spectral components (atoms) and then exploits signal sparsity in the time-atom representation for simultaneous multiple source localization for multi-speaker localization.

Aalborg Universitet Adaptive Pre-whitening Based on Parametric NMF

An adaptive pre-whitener based on a supervised non-negative matrix factorization (NMF), in which a pre-trained dictionary includes parametrized spectral information about the noise and speech sources in the form of autoregressive (AR) coefficients, shows that the noise can get closer to white, in comparison to pre-Whiteners based on conventional noise power spectral density (PSD) estimates.

Sparse pursuit and dictionary learning for blind source separation in polyphonic music recordings

This work develops a novel sparse pursuit algorithm that can match the discrete frequency spectra from the recorded signal with the continuous spectra delivered by the model, and develops a dictionary that contains the model parameters which characterize the musical instruments, using a modified version of Adam.

Adding Context Information to Deep Neural Network based Audio Source Separation

A novel self-attention mechanism is proposed, which is able to filter out unwanted interferences and distortions by utilizing the repetitive nature of music.

References

SHOWING 1-10 OF 65 REFERENCES

Score-Informed Source Separation for Musical Audio Recordings: An overview

Recent developments in score-informed source separation are reviewed and various strategies for integrating the prior knowledge encoded by the score are discussed.

Sparse NMF – half-baked or well done?

Results show that, contrary to a popular belief in the community, learning basis functions using NMF with sparsity leads to significant gains in source-to-distortion ratio with respect to both exemplar-based NMF and the ad hoc implementation of sparse NMF.

A comparative study on sparsity penalties for NMF-based speech separation: Beyond LP-norms

The results show that enforcing the sparsity constraint in the separation phase does not improve the perceptual quality, but in the learning phase it yields a better estimation of the base spectra, especially in the case of supervised NMF, where the proposed criteria delivered the best results.

Gamma Markov Random Fields for Audio Source Modeling

  • O. DikmenA. Cemgil
  • Computer Science
    IEEE Transactions on Audio, Speech, and Language Processing
  • 2010
This paper optimize the hyperparameters of the GMRF-based audio model using contrastive divergence and compares this method to alternatives such as score matching and pseudolikelihood maximization where applicable.

An interactive audio source separation framework based on non-negative matrix factorization

A novel interactive source separation framework that allows end-users to provide feedback at each separation step so as to gradually improve the result and is based on non-negative matrix factorization.

Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria

  • T. Virtanen
  • Computer Science
    IEEE Transactions on Audio, Speech, and Language Processing
  • 2007
An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented and enables a better separation quality than the previous algorithms.

Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation

This paper proposes a new additive synthesis-based approach which allows the use of linear-frequency spectrograms as well as imposing strict harmonic constraints, resulting in an improved model.

Sound Source Separation Using Sparse Coding with Temporal Continuity Objective

A data-adaptive sound source separation system is presented, which is able to extract meaningful sources from polyphonic real-world music signals and Temporal continuity objective is proposed as an improvement to the existing techniques.

Universal speech models for speaker independent single channel source separation

  • Dennis L. SunG. Mysore
  • Computer Science
    2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
This work proposes a method to learn a universal speech model from a general corpus of speech and shows how to use this model to separate speech from other sound sources and shows that this method improves performance when training data of the non-speech source is available.

Separation of Vocals from Polyphonic Audio Recordings

The quality of vocal source separation is not sufficient enough for further F0 analysis to extract the melody line from the vocal track, so techniques to identify vocal sections in a music sample are presented and a classifier to perform a vocal–nonvocal segmentation task is designed.
...