Music Demixing Challenge 2021

@inproceedings{Mitsufuji2021MusicDC,
  title={Music Demixing Challenge 2021},
  author={Yuki Mitsufuji and Giorgio Fabbro and Stefan Uhlich and Fabian-Robert St{\"o}ter and Alexandre D{\'e}fossez and Minseok Kim and Woo-Sung Choi and Chin-Yun Yu and Kin Wai Cheuk},
  booktitle={Frontiers in Signal Processing},
  year={2021}
}
Music source separation has been intensively studied in the last decade and tremendous progress with the advent of deep learning could be observed. Evaluation campaigns such as MIREX or SiSEC connected state-of-the-art models and corresponding papers, which can help researchers integrate the best practices into their models. In recent years, the widely used MUSDB18 dataset played an important role in measuring the performance of music source separation. While the dataset made a considerable… 

Figures and Tables from this paper

Automatic music mixing with deep learning and out-of-domain data

This work explores whether out-of-domain data such as wet or processed multitrack music recordings and repurpose it to train supervised deep learning models that can bridge the current gap in automatic mixing quality.

Towards robust music source separation on loud commercial music

This paper created the out-of-domain evaluation datasets, musdb-L and XL, by mim-icking the music mastering process, and proposed LimitAug data augmentation method to reduce the domain mismatch, which utilizes an online limiter during the training data sampling process.

Music Source Separation with Band-split RNN

BSRNN is proposed, a frequency-domain model that explictly splits the spectrogram of the mixture into subbands and perform interleaved band-level and sequence-level modeling and describes a semi-supervised model netuning pipeline that can further improve the performance of the model.

Hybrid Transformers for Music Source Separation

A natural question arising in Music Source Separation (MSS) is whether long range contextual information is useful, or whether local acoustic features are sufficient. In other fields, attention based

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

From the results, it is shown the proposed system not only converts the mixing style of multitrack audio close to a reference but is also robust with mixture-wise style transfer upon using a music source separation model.

Music Source Separation With Deep Equilibrium Models

An architecture and training scheme for MSS with DEQ, which replaces the architecture of Open-Unmix with the DEQ model and shows that DEQ-UMX performs better than the original UMX while reducing its number of parameters by 30%.

Multi-Scale Temporal-Frequency Attention for Music Source Separation

A temporal-frequency attention module is proposed to model the spectrogram correlations along both temporal and frequency dimensions, and a multi-scale attention is suggested to effectively capture the correlations for music signal.

Removing Distortion Effects in Music Using Deep Neural Networks

This paper focuses on removing distortion and clipping applied to guitar tracks for music production while presenting a comparative investigation of different deep neural network (DNN) architectures on this task, achieving exceptionally good results in distortion removal using DNNs.

MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation

An evaluation dataset and baseline studies for multiple singing voices separation are presented and the improved super-resolution network ( iSRNet) is proposed, which achieved comparable performance to ideal time-frequency masks on duet and unison subsets of MedleyVox.

Music Separation Enhancement with Generative Modeling

A post-processing generative model (the Make it Sound Good (MSG) post-processor) is proposed to enhance the output of music source separation systems and it is demonstrated that human listeners prefer source estimates of bass and drums that have been post-processed by MSG.

References

SHOWING 1-10 OF 37 REFERENCES

Open-Unmix - A Reference Implementation for Music Source Separation

Open-Unmix provides implementations for the most popular deep learning frameworks, giving researchers a flexible way to reproduce results and provides a pre-trained model for end users and even artists to try and use source separation.

Danna-Sep: Unite to separate them all

The backbones of the proposed framework are two spectrogram-based models including a modified X-UMX and U-Net, and an enhanced Demucs as the waveform-based model, and it is shown that Danna-Sep surpassed the SoTA models by a large margin in terms of Source-to-Distortion Ratio.

All For One And One For All: Improving Music Separation By Bridging Networks

Experimental results show that the performance of Open-Unmix (UMX), a well-known and state-of-the-art open-source library for music separation, can be improved by utilizing a multi-domain loss (MDL) and two combination schemes.

Improving music source separation based on deep neural networks through data augmentation and network blending

This paper describes two different deep neural network architectures for the separation of music into individual instrument tracks, a feed-forward and a recurrent one, and shows that each of them yields themselves state-of-the art results on the SiSEC DSD100 dataset.

Music Source Separation in the Waveform Domain

Demucs is proposed, a new waveform-to-waveform model, which has an architecture closer to models for audio generation with more capacity on the decoder, and human evaluations show that Demucs has significantly higher quality than Conv-Tasnet, but slightly more contamination from other sources, which explains the difference in SDR.

MUSDB18-HQ - an uncompressed version of MUSDB18

MUSDB18-HQ is the uncompressed version of the MUSDB18 dataset. It consists of a total of 150 full-track songs of different styles and includes both the stereo mixtures and the original sources,

Multi-Scale multi-band densenets for audio source separation

A novel network architecture that extends the recently developed densely connected convolutional network (DenseNet) and takes advantage of long contextual information and outperforms state-of-the-art results on SiSEC 2016 competition by a large margin in terms of signal-to-distortion ratio.

Spleeter: a fast and efficient music source separation tool with pre-trained models

The performance of the pre-trained models are very close to the published state-of-the-art and is one of the best performing 4 stems separation model on the common musdb18 benchmark (Rafii, Liutkus,

MUSDB18 - a corpus for music separation

The sigsep musdb18 data set consists of a total of 150 full-track songs of different styles and includes both the stereo mixtures and the original sources, divided between a training subset and a

The 2018 Signal Separation Evaluation Campaign

This year's edition of SiSEC was focused on audio and pursued the effort towards scaling up and making it easier to prototype audio separation software in an era of machine-learning based systems, including a new music separation database: MUSDB18.