Audio Barlow Twins: Self-Supervised Audio Representation Learning
@article{Anton2022AudioBT, title={Audio Barlow Twins: Self-Supervised Audio Representation Learning}, author={Jonah Anton and Harry Coppock and Pancham Shukla and Bj{\"o}rn Schuller}, journal={ArXiv}, year={2022}, volume={abs/2209.14345} }
The Barlow Twins self-supervised learning objective requires neither negative samples or asymmetric learning updates, achieving results on a par with the current state-of-the-art within Computer Vision. As such, we present Audio Barlow Twins , a novel self-supervised audio representation learning approach, adapting Barlow Twins to the audio domain. We pre-train on the large-scale audio dataset AudioSet, and evaluate the quality of the learnt representations on 18 tasks from the HEAR 2021…
Figures and Tables from this paper
References
SHOWING 1-10 OF 35 REFERENCES
Contrastive Learning of General-Purpose Audio Representations
- Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
This work builds on top of recent advances in contrastive learning for computer vision and reinforcement learning to design a lightweight, easy-to-implement self-supervised model of audio, and shows that despite its simplicity, this method significantly outperforms previous self- supervised systems.
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation
- Computer Science2021 International Joint Conference on Neural Networks (IJCNN)
- 2021
Inspired by the recent progress in self-supervised learning for computer vision that generates supervision using data augmentations, we explore a new general-purpose audio representation learning…
BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping
- Computer ScienceArXiv
- 2022
This work extends existing methods based on self-supervised learning by bootstrapping, proposes various encoder architectures, and explores the importance of using different pre-training datasets to develop general-purpose audio representations.
A Note on Connecting Barlow Twins with Negative-Sample-Free Contrastive Learning
- Computer ScienceArXiv
- 2021
Compared to the prior state-of-the-art SSL methods, Barlow Twins demonstrates two main properties: its algorithm requires no explicit construction of negative sample pairs, and is not sensitive to large training batch sizes.
CLAR: Contrastive Learning of Auditory Representations
- Computer ScienceAISTATS
- 2021
By combining all these methods and with substantially less labeled data, the CLAR framework achieves significant improvement on prediction performance compared to supervised approach and converges faster with significantly better representations.
BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2023
This study hypothesizes that representations effective for general audio tasks should provide multiple aspects of robust features of the input sound and proposes a self-supervised learning method, Bootstrap Your Own Latent for Audio (BYOL-A, pronounced “viola”), which makes the learned representations robust to the perturbations of sounds.
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
- Computer ScienceICML
- 2021
This work proposes an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible.
Unsupervised Contrastive Learning of Sound Event Representations
- Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
This work proposes to use the pretext task of contrasting differently augmented views of sound events to suggest that unsupervised contrastive pre-training can mitigate the impact of data scarcity and increase robustness against noisy labels.
SSAST: Self-Supervised Audio Spectrogram Transformer
- Computer ScienceAAAI
- 2022
This paper proposes to pretrain the Audio Spectrogram Transformer model with joint discriminative and generative masked spectrogram patch modeling (MSPM) using unlabeled audio from AudioSet and Librispeech, and is the first patch-based self-supervised learning framework in the audio and speech domain, and also the first self- supervised learning framework for AST.