• Corpus ID: 231934101

Improving Deep-learning-based Semi-supervised Audio Tagging with Mixup

  title={Improving Deep-learning-based Semi-supervised Audio Tagging with Mixup},
  author={L{\'e}o Cances and Etienne Labb'e and Thomas Pellegrini},
Recently, semi-supervised learning (SSL) methods, in the framework of deep learning (DL), have been shown to provide state-of-the-art results on image datasets by exploiting unlabeled data. Most of the time tested on object recognition tasks in images, these algorithms are rarely compared when applied to audio tasks. In this article, we adapted four recent SSL methods to the task of audio tagging. The first two methods, namely Deep Co-Training (DCT) and Mean Teacher (MT), involve two… 
2 Citations

Figures and Tables from this paper

Improving Semi-Supervised Learning for Audio Classification with FixMatch
Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained
A Preliminary Study on Environmental Sound Classification Leveraging Large-Scale Pretrained Model and Semi-Supervised Learning
To simulate a low-resource sound classification setting where only limited supervised examples are made available, the notion of transfer learning is instantiated with a recently proposed training algorithm and a data augmentation method to achieve the goal of semi-supervised model training.


Semi-Supervised Audio Classification with Consistency-Based Regularization
This paper incorporates audio-specific perturbations into the Mean Teacher algorithm and demonstrates the effectiveness of the resulting method on audio classification tasks.
Unsupervised Data Augmentation for Consistency Training
A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.
Temporal Ensembling for Semi-Supervised Learning
Self-ensembling is introduced, where it is shown that this ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training.
Deep Co-Training for Semi-Supervised Image Recognition
This paper presents Deep Co-Training, a deep learning based method inspired by the Co- Training framework, which outperforms the previous state-of-the-art methods by a large margin in semi-supervised image recognition.
Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning
An unsupervised loss function is proposed that takes advantage of the stochastic nature of these methods and minimizes the difference between the predictions of multiple passes of a training sample through the network.
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks, but it becomes unwieldy when learning large datasets, so Mean Teacher, a method that averages model weights instead of label predictions, is proposed.
MixMatch: A Holistic Approach to Semi-Supervised Learning
This work unify the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that works by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeling data using MixUp.
Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks
This simple and efficient method of semi-supervised learning for deep neural networks is proposed, trained in a supervised fashion with labeled and unlabeled data simultaneously and favors a low-density separation between classes.
Audio Set: An ontology and human-labeled dataset for audio events
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Wide Residual Networks
This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.