Monaural Music Source Separation using a ResNet Latent Separator Network

  title={Monaural Music Source Separation using a ResNet Latent Separator Network},
  author={Gino Brunner and Nawel Naas and Sveinn P{\'a}lsson and Oliver Richter and Roger Wattenhofer},
  journal={2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)},
In this paper we study the problem of monaural music source separation, where a piece of music is to be separated into its main constituent sources. We propose a simple yet effective deep neural network architecture based on a ResNet autoencoder. We investigate several data augmentation and post-processing methods to improve the separation results and outperform various state of the art monaural source separation methods on the DSD100 and MUSDB18 datasets. Our results suggest that in order to… 

Figures and Tables from this paper

Sudo RM -RF: Efficient Networks for Universal Audio Source Separation

The backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRM-RF) as well as their aggregation which is performed through simple one-dimensional convolutions.

An Efficient Short-Time Discrete Cosine Transform and Attentive MultiResUNet Framework for Music Source Separation

A novel Attentive MultiResUNet architecture is proposed, that uses real-valued Short-Time Discrete Cosine Transform data as inputs and is used for the first time in source separation and is more computationally efficient than state-of-the-art separation networks.

Compute and Memory Efficient Universal Sound Source Separation

This study provides a family of efficient neural network architectures for general purpose audio source separation while focusing on multiple computational aspects that hinder the application of neural networks in real-world scenarios.

Compute and Memory Efficient Universal Sound Source Separation

This study provides a family of efficient neural network architectures for general purpose audio source separation while focusing on multiple computational aspects that hinder the application of neural networks in real-world scenarios.

ResNet based on feature-inspired gating strategy

A feature-inspired gating strategy in the residual unit of ResNet is introduced, which allows the network giving different weights to different features, so that the implementation of the feature fusion can be transformed from adding features with equal weights into weighted summation with different weights.

Table of Contents

  • P. TalbotB. Bliem
  • 2022 4th International Conference on Applied Automation and Industrial Diagnostics (ICAAID)
  • 2022



Improving music source separation based on deep neural networks through data augmentation and network blending

This paper describes two different deep neural network architectures for the separation of music into individual instrument tracks, a feed-forward and a recurrent one, and shows that each of them yields themselves state-of-the art results on the SiSEC DSD100 dataset.

Monoaural Audio Source Separation Using Deep Convolutional Neural Networks

A low-latency monaural source separation framework using a Convolutional Neural Network and the performance of the neural network is evaluated on a database comprising of musical mixtures of three instruments as well as other instruments which vary from song to song.

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

This work builds upon the recently proposed Masker-Denoiser (MaD) architecture and enhances it with the Twin Networks, a technique to regularize a recurrent generative network using a backward running copy of the network.

Deep learning for monaural speech separation

The joint optimization of the deep learning models (deep neural networks and recurrent neural networks) with an extra masking layer, which enforces a reconstruction constraint, is proposed to enhance the separation performance of monaural speech separation models.

Mmdenselstm: An Efficient Combination of Convolutional and Recurrent Neural Networks for Audio Source Separation

A novel architecture that integrates long short-term memory (LSTM) in multiple scales with skip connections to efficiently model long-term structures within an audio context is proposed and yields better results than those obtained using ideal binary masks for a singing voice separation task.

Multi-Scale multi-band densenets for audio source separation

A novel network architecture that extends the recently developed densely connected convolutional network (DenseNet) and takes advantage of long contextual information and outperforms state-of-the-art results on SiSEC 2016 competition by a large margin in terms of signal-to-distortion ratio.

MUSDB18 - a corpus for music separation

The sigsep musdb18 data set consists of a total of 150 full-track songs of different styles and includes both the stereo mixtures and the original sources, divided between a training subset and a

Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation

Joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including speech separation, singing voice separation, and speech denoising, and a discriminative criterion for training neural networks to further enhance the separation performance are explored.

Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

A recurrent inference algorithm, a sparse transformation step to improve the mask generation process, and a learned denoising filter are introduced that learns and optimizes a source-dependent mask and does not need a post processing step.

An Overview of Lead and Accompaniment Separation in Music

This article provides a comprehensive review of this research topic, organizing the different approaches according to whether they are model-based or data-centered, and presents the results of the largest evaluation, to-date, of lead and accompaniment separation systems.