Weakly Supervised Audio Source Separation via Spectrum Energy Preserved Wasserstein Learning

  title={Weakly Supervised Audio Source Separation via Spectrum Energy Preserved Wasserstein Learning},
  author={Ning Zhang and Junchi Yan and Yuchen Zhou},
Separating audio mixtures into individual instrument tracks has been a standing challenge. We introduce a novel weakly supervised audio source separation approach based on deep adversarial learning. Specifically, our loss function adopts the Wasserstein distance which directly measures the distribution distance between the separated sources and the real sources for each individual source. Moreover, a global regularization term is added to fulfill the spectrum energy preservation property… 

Figures and Tables from this paper

Audio Source Separation Using Variational Autoencoders and Weak Class Supervision
This letter proposes a source separation method that is trained by observing the mixtures and the class labels of the sources present in the mixture without any access to isolated sources and shows that the separation performance obtained is as good as the performance obtained by source signal supervision.
Weak Label Supervision for Monaural Source Separation Using Non-negative Denoising Variational Autoencoders
This paper proposes a weak supervision method that only uses class information rather than source signals for learning to separate short utterance mixtures, and demonstrates that the separation results are on par with source signal supervision.
Learning to Separate Sounds from Weakly Labeled Scenes
This work proposes objective functions and network architectures that enable training a source separation system with weak labels, and benchmarks performance using synthetic mixtures of overlapping sound events recorded in urban environments.
Music Source Separation with Generative Flow
Experiments show that in singing voice and music separation tasks, these proposed systems achieve competitive results to one of the full supervision systems, and one variant of the proposed systems is capable of separating new source tracks effortlessly.
Transcription Is All You Need: Learning To Separate Musical Mixtures With Score As Supervision
This work uses musical scores, which are comparatively easy to obtain, as a weak label for training a source separation system, and proposes two novel adversarial losses for additional fine-tuning of both the transcriptor and the separator.
Training Generative Adversarial Networks from Incomplete Observations using Factorised Discriminators
This work shows how to factorise the joint data distribution into a set of lower-dimensional distributions along with their dependencies, which allows splitting the discriminator in a GAN into multiple "sub-discriminators" that can be independently trained from incomplete observations.
Singing Voice Separation with Deep U-Net Convolutional Networks
This work proposes a novel application of the U-Net architecture — initially developed for medical imaging — for the task of source separation, given its proven capacity for recreating the fine, low-level detail required for high-quality audio reproduction.
Research on Deep Sound Source Separation
The β-VAE model, combined with a weakly supervised classification proposed by Karamatlı et al., was first reproduced and it turns out that the separation results could be obtained by retraining the model after the establishment of the new 'male' and 'female' labels.
Source Separation with Weakly Labelled Data: an Approach to Computational Auditory Scene Analysis
This work proposes a source separation framework trained with weakly labelled data that can separate 527 kinds of sound classes from AudioSet within a single system.
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning
This paper introduces a TSE framework, SoundBeam, that combines the advantages of both enrollment and enrollment-based approaches, and performs an extensive evaluation of the different TSE schemes using synthesized and real mixtures, which shows the potential of Sound beam.


SVSGAN: Singing Voice Separation Via Generative Adversarial Network
Experimental results on three datasets show that performance can be improved by the proposed framework consisting of conventional networks for singing voice separation using the generative adversarial network with a time-frequency masking function.
Generative Adversarial Source Separation
  • Y. C. Sübakan, P. Smaragdis
  • Computer Science
    2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
It is shown on a speech source separation experiment that, a multilayer perceptron trained with a Wasserstein-GAN formulation outperforms NMF, auto-encoders trained with maximum likelihood, and variational auto- Encoders in terms of source to distortion ratio.
Improving music source separation based on deep neural networks through data augmentation and network blending
This paper describes two different deep neural network architectures for the separation of music into individual instrument tracks, a feed-forward and a recurrent one, and shows that each of them yields themselves state-of-the art results on the SiSEC DSD100 dataset.
Improved Training of Wasserstein GANs
This work proposes an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input, which performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning.
Multichannel Audio Source Separation With Deep Neural Networks
This article proposes a framework where deep neural networks are used to model the source spectra and combined with the classical multichannel Gaussian model to exploit the spatial information and presents its application to a speech enhancement problem.
Scalable audio separation with light Kernel Additive Modelling
It is shown how KAM can be combined with a fast compression algorithm of its parameters to address the scalability issue, thus enabling its use on small platforms or mobile devices.
Monoaural Audio Source Separation Using Deep Convolutional Neural Networks
A low-latency monaural source separation framework using a Convolutional Neural Network and the performance of the neural network is evaluated on a database comprising of musical mixtures of three instruments as well as other instruments which vary from song to song.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
This work introduces a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrates that they are a strong candidate for unsupervised learning.
A General Flexible Framework for the Handling of Prior Information in Audio Source Separation
This paper introduces a general audio source separation framework based on a library of structured source models that enable the incorporation of prior knowledge about each source via user-specifiable constraints.
From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound
Audio is a domain where signal separation has long been considered as a fascinating objective, potentially offering a wide range of new possibilities and experiences in professional and personal