• Corpus ID: 245769673

Self-Supervised Beat Tracking in Musical Signals with Polyphonic Contrastive Learning

  title={Self-Supervised Beat Tracking in Musical Signals with Polyphonic Contrastive Learning},
  author={Dorian Desblancs},
Annotating musical beats is a very long in tedious process. In order to combat this problem, we present a new self-supervised learning pretext task for beat tracking and downbeat estimation. This task makes use of Spleeter [27], an audio source separation model, to separate a song’s drums from the rest of its signal. The first set of signals are used as positives, and by extension negatives, for contrastive learning pre-training. The drum-less signals, on the other hand, are used as anchors… 



Self-Supervised Learning of Audio Representations From Permutations With Differentiable Ranking

This work advances self-supervised learning from permutations, by pre-training a model to reorder shuffled parts of the spectrogram of an audio signal, to improve downstream classification performance and improve instrument classification and pitch estimation of musical notes by reordering spectrogram patches in the time-frequency space.

SPICE: Self-Supervised Pitch Estimation

The proposed self-supervised learning technique is able to estimate pitch at a level of accuracy comparable to fully supervised models, both on clean and noisy audio samples, although it does not require access to large labeled datasets.

Multi-Task Self-Supervised Pre-Training for Music Classification

This paper applies self-supervised and multi-task learning methods for pre-training music encoders, and explores various design choices including encoder architectures, weighting mechanisms to combine losses from multiple tasks, and worker selections of pretext tasks to investigate how these design choices interact with various downstream music classification tasks.

Learning a Joint Embedding Space of Monophonic and Mixed Music Signals for Singing Voice

A system to learn a joint embedding space of monophonic and mixed tracks for singing voice using a metric learning method, which ensures that tracks from both domains of the same singer are mapped closer to each other than those of different singers.

Contrastive Learning of Musical Representations

It is shown that CLMR’s representations are transferable using out-of-domain datasets, indicating that the method has strong generalisability in music classification and to foster reproducibility and future research on self-supervised learning in music, the models and source code are publicly released.

Contrastive Learning of General-Purpose Audio Representations

This work builds on top of recent advances in contrastive learning for computer vision and reinforcement learning to design a lightweight, easy-to-implement self-supervised model of audio, and shows that despite its simplicity, this method significantly outperforms previous self- supervised systems.

Temporal convolutional networks for musical audio beat tracking

Three highly promising attributes of TCNs for music analysis are demonstrated, namely: they achieve state-of-the-art performance on a wide range of existing beat tracking datasets, they are well suited to parallelisation and thus can be trained efficiently even on very large training data, and they require a small number of weights.

A Multi-model Approach to Beat Tracking Considering Heterogeneous Music Styles

A new beat tracking algorithm which extends an existing state-of-the-art system with a multi-model approach to represent different music styles and is able to match even human tapping performance.

Using Self-Supervised Learning of Birdsong for Downstream Industrial Audio Classification

It is demonstrated that motorized sound classification models using self-supervised learning with a dataset of pitch intensive birdsong, combined with select data augmentation, achieves better results than using the pre-trained pitch model.

Joint Beat and Downbeat Tracking with Recurrent Neural Networks

A recurrent neural network operating directly on magnitude spectrograms is used to model the metrical structure of the audio signals at multiple levels and provides an output feature that clearly distinguishes between beats and downbeats.