Singing Voice Separation: A Study on Training Data

@article{Prtet2019SingingVS,
  title={Singing Voice Separation: A Study on Training Data},
  author={Laure Pr{\'e}tet and Romain Hennequin and Jimena Royo-Letelier and Andrea Vaglio},
  journal={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2019},
  pages={506-510}
}
In the recent years, singing voice separation systems showed increased performance due to the use of supervised training. The design of training datasets is known as a crucial factor in the performance of such systems. We investigate on how the characteristics of the training dataset impacts the separation performances of state-of-the-art singing voice separation algorithms. We show that the separation quality and diversity are two important and complementary assets of a good training dataset… 

Figures and Tables from this paper

Improved singing voice separation with chromagram-based pitch-aware remixing
TLDR
It is demonstrated that training models with pitch-aware remixing significantly improves the test signal-to-distortion ratio (SDR) by performing controlled experiments in both supervised and semi-supervised settings.
Semi-Supervised Singing Voice Separation With Noisy Self-Training
TLDR
Empirical results show that the proposed self-training scheme, along with data augmentation methods, effectively leverage the large unlabeled corpus and obtain superior performance compared to supervised methods.
Content based singing voice source separation via strong conditioning using aligned phonemes
TLDR
It is shown that phoneme conditioning can be successfully applied to improve singing voice source separation and explored strong conditioning using the aligned phonemes.
CatNet: music source separation system with mix-audio augmentation
TLDR
This article proposes an end-to-end and fully differentiable system that incorporate spectrogram calculation into CatNet and proposes a novel mix-audio data augmentation method that randomly mix audio segments from the same source as augmented audio segments for training.
Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity
TLDR
It is shown that the synthesized Lakh dataset (Slakh) can be used to effectively augment existing datasets for musical instrument separation, while opening the door to a wide array of data-intensive music signal analysis tasks.
SPLEETER: A FAST AND STATE-OF-THE ART MUSIC SOURCE SEPARATION TOOL WITH PRE-TRAINED MODELS
TLDR
Spleeter is a new tool for music source separation with pre-trained models based on Tensorflow that makes it possible to separate audio files into 2, 4 or 5 stems with a single command line using pre- trained models.
Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation
TLDR
This paper focuses on singing voice separation, employing an RNN architecture, and replaces the RNNs with DWS convolutions (DWS-CNNs), which are a lightweight and faster variant of the typical convolutions.
Adding Context Information to Deep Neural Network based Audio Source Separation
TLDR
A novel self-attention mechanism is proposed, which is able to filter out unwanted interferences and distortions by utilizing the repetitive nature of music.
Spleeter: a fast and efficient music source separation tool with pre-trained models
The performance of the pre-trained models are very close to the published state-of-the-art and is one of the best performing 4 stems separation model on the common musdb18 benchmark (Rafii, Liutkus,
Ensemble Size Classification in Colombian Andean String Music Recordings
TLDR
This work investigates the problem of counting the number of instruments in music recordings as a classification task, using a new data set of Colombian Andean string music to serve as a baseline for future research on ensemble size classification.
...
1
2
...

References

SHOWING 1-10 OF 24 REFERENCES
On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset
  • Chao-Ling Hsu, J. Jang
  • Computer Science
    IEEE Transactions on Audio, Speech, and Language Processing
  • 2010
TLDR
This paper has constructed a corpus called MIR-1K (multimedia information retrieval lab, 1000 song clips), where all singing voices and music accompaniments were recorded separately, and enhanced the performance of separating voiced singing via a spectral subtraction method.
Singing Voice Separation with Deep U-Net Convolutional Networks
TLDR
This work proposes a novel application of the U-Net architecture — initially developed for medical imaging — for the task of source separation, given its proven capacity for recreating the fine, low-level detail required for high-quality audio reproduction.
MUSDB18 - a corpus for music separation
The sigsep musdb18 data set consists of a total of 150 full-track songs of different styles and includes both the stereo mixtures and the original sources, divided between a training subset and a
Adversarial Semi-Supervised Audio Source Separation Applied to Singing Voice Extraction
  • D. Stoller, S. Ewert, S. Dixon
  • Computer Science
    2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
TLDR
This work adopts adversarial training for music source separation with the aim of driving the separator towards outputs deemed as realistic by discriminator networks that are trained to tell apart real from separator samples.
SVSGAN: Singing Voice Separation Via Generative Adversarial Network
TLDR
Experimental results on three datasets show that performance can be improved by the proposed framework consisting of conventional networks for singing voice separation using the generative adversarial network with a time-frequency masking function.
Improving music source separation based on deep neural networks through data augmentation and network blending
TLDR
This paper describes two different deep neural network architectures for the separation of music into individual instrument tracks, a feed-forward and a recurrent one, and shows that each of them yields themselves state-of-the art results on the SiSEC DSD100 dataset.
Monoaural Audio Source Separation Using Deep Convolutional Neural Networks
TLDR
A low-latency monaural source separation framework using a Convolutional Neural Network and the performance of the neural network is evaluated on a database comprising of musical mixtures of three instruments as well as other instruments which vary from song to song.
Deep clustering and conventional networks for music separation: Stronger together
TLDR
It is shown that deep clustering outperforms conventional networks on a singing voice separation task, in both matched and mismatched conditions, even though conventional networks have the advantage of end-to-end training for best signal approximation.
A recurrent encoder-decoder approach with skip-filtering connections for monaural singing voice separation
TLDR
The results from an objective evaluation show that the proposed method provides comparable results to deep learning based methods which operate over complicated signal representations, as compared to previous methods that approximate time-frequency masks.
Multichannel music separation with deep neural networks
TLDR
A framework where the source spectra are estimated using deep neural networks and combined with spatial covariance matrices to encode the source spatial characteristics is proposed and used to derive a multichannel Wiener filter.
...
1
2
3
...