Audio tagging with noisy labels and minimal supervision

@inproceedings{Fonseca2019AudioTW,
  title={Audio tagging with noisy labels and minimal supervision},
  author={Eduardo Fonseca and Manoj Plakal and Frederic Font and Daniel P. W. Ellis and Xavier Serra},
  booktitle={DCASE},
  year={2019}
}
This paper introduces Task 2 of the DCASE2019 Challenge, titled "Audio tagging with noisy labels and minimal supervision". This task was hosted on the Kaggle platform as "Freesound Audio Tagging 2019". The task evaluates systems for multi-label audio tagging using a large set of noisy-labeled data, and a much smaller set of manually-labeled data, under a large vocabulary setting of 80 everyday sound classes. In addition, the proposed dataset poses an acoustic mismatch problem between the noisy… 

Figures and Tables from this paper

Detection and Classification of Acoustic Scenes and Events 2019 Challenge MULTI-LABEL AUDIO TAGGING WITH NOISY LABELS AND VARIABLE LENGTH Technical Report
TLDR
This paper proposes a data generation method named Dominate Mixup which can restrain the impact of incorrect label during back propagation and it’s suitable for multi-class classification problem.
Multiple Neural Networks with Ensemble Method for Audio Tagging with Noisy Labels and Minimal Supervision
TLDR
This system uses a sigmoid-softmax activation to deal with so-called sparse multi-label classification and an ensemble method that averages models learned with multiple neural networks and various acoustic features to achieve labelweighted label-ranking average precision scores.
AUDIO TAGGING WITH CONVOLUTIONAL NEURAL NETWORKS TRAINED WITH NOISY DATA Technical Report
TLDR
An ensemble that provides us with the likelihood of 80 different labels being present in an input audio clip is obtained by averaging over the predictions of all five networks, and reaches a Label Weighted Label Ranking Average Precision of 0.722.
AUDIO TAGGING WITH MINIMAL SUPERVISION BASED ON MEAN TEACHER FOR DCASE 2019 CHALLENGE TASK 2 Technical Report
TLDR
The mean teacher based audio tagging system and performance applied to the task 2 of DCASE 2018 challenge, where the task evaluates systems for audio tagging with noisy labels and minimal supervision, is described.
Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision
TLDR
This paper proposes a model consisting of a convolutional front end using log-mel-energies as input features, a recurrent neural network sequence encoder and a fully connected classifier network outputting an activity probability for each of the 80 considered event classes.
Staged Training Strategy and Multi-Activation for Audio Tagging with Noisy and Sparse Multi-Label Data
TLDR
This paper proposes a staged training strategy to deal with the noisy label, and adopts a sigmoid-sparsemax multi-activation structure toDeal with the sparse multi-label classification of audio tagging.
Audio Tagging by Cross Filtering Noisy Labels
TLDR
This article presents a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging, and achieves state-of-the-art performance and even surpasses the ensemble models on FSDKaggle2018 dataset.
DCASE 2019 TASK 2 : SEMI-SUPERVISED NETWORKS WITH HEAVY DATA AUGMENTATIONS TO BATTLE AGAINST LABEL NOISE IN AUDIO TAGGING TASK
TLDR
A semi-supervised teacher-student convolutional neural network is used to leverage substantial noisy labels and small curated labels in dataset and Aadaptive test time augmentation (TTA) based on the lengths of audio samples is used as a final approach to improve the system.
MULTITASK LEARNING AND SEMI-SUPERVISED LEARNING WITH NOISY DATA FOR AUDIO TAGGING Technical Report
TLDR
This paper's submission to the DCASE 2019 challenge Task 2 "Audio tagging with noisy labels and minimal supervision" is described, which achieves a score of 0.750 with labelweighted label-ranking average precision (lwlrap).
SPECMIX : A SIMPLE DATA AUGMENTATION AND WARM-UP PIPELINE TO LEVERAGE CLEAN AND NOISY SET FOR EFFICIENT AUDIO TAGGING
TLDR
A semi-supervised warm-up pipeline used to create an efficient audio tagging system as well as a novel data augmentation technique for multi-labels audio tagging named by the author SpecMix are presented.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 28 REFERENCES
Learning Sound Event Classifiers from Web Audio with Noisy Labels
TLDR
Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data, and it is shown that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.
General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline
TLDR
The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.
Learning Sound Events From Webly Labeled Data
TLDR
This work introduces webly labeled learning for sound events which aims to remove human supervision altogether from the learning process, and develops a method of obtaining labeled audio data from the web, in which no manual labeling is involved.
Model-Agnostic Approaches To Handling Noisy Labels When Training Sound Event Classifiers
TLDR
This work evaluates simple and efficient model-agnostic approaches to handling noisy labels when training sound event classifiers, namely label smoothing regularization, mixup and noise-robust loss functions, which can be easily incorporated to existing deep learning pipelines without need for network modifications or extra resources.
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Learning from Noisy Large-Scale Datasets with Minimal Supervision
TLDR
An approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations and is particularly effective for a large number of classes with wide range of noise in annotations.
Chime-home: A dataset for sound source recognition in a domestic environment
TLDR
The annotation approach associates each 4-second excerpt from the audio recordings with multiple labels, based on a set of 7 labels associated with sound sources in the acoustic environment, to obtain a representation of `ground truth' in annotations.
CNN architectures for large-scale audio classification
TLDR
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
Training deep neural-networks using a noise adaptation layer
TLDR
This study presents a neural-network approach that optimizes the same likelihood function as optimized by the EM algorithm but extended to the case where the noisy labels are dependent on the features in addition to the correct labels.
Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach
TLDR
It is proved that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise, and it is shown how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and providing an end-to-end framework.
...
1
2
3
...