Model-Agnostic Approaches To Handling Noisy Labels When Training Sound Event Classifiers

@article{Fonseca2019ModelAgnosticAT,
  title={Model-Agnostic Approaches To Handling Noisy Labels When Training Sound Event Classifiers},
  author={Eduardo Fonseca and Frederic Font and Xavier Serra},
  journal={2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
  year={2019},
  pages={16-20}
}
  • Eduardo Fonseca, F. Font, X. Serra
  • Published 1 October 2019
  • Computer Science
  • 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
Label noise is emerging as a pressing issue in sound event classification. This arises as we move towards larger datasets that are difficult to annotate manually, but it is even more severe if datasets are collected automatically from online repositories, where labels are inferred through automated heuristics applied to the audio content or metadata. While learning from noisy labels has been an active area of research in computer vision, it has received little attention in sound event… 

Figures and Tables from this paper

Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking
TLDR
This work proposes a simple and model-agnostic method based on a teacher-student framework with loss masking to first identify the most critical missing label candidates, and then ignore their contribution during the learning process, finding that a simple optimisation of the training label set improves recognition performance without additional computation.
Unsupervised Contrastive Learning of Sound Event Representations
TLDR
This work proposes to use the pretext task of contrasting differently augmented views of sound events to suggest that unsupervised contrastive pre-training can mitigate the impact of data scarcity and increase robustness against noisy labels.
Audio tagging with noisy labels and minimal supervision
TLDR
This paper presents the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network.
J ul 2 02 1 IMPROVING SOUND EVENT CLASSIFICATION BY INCREASING SHIFT INVARIANCE IN CONVOLUTIONAL NEURAL NETWORKS
TLDR
This paper evaluates two pooling methods to improve shift invariance in CNNs, based on low-pass filtering and adaptive sampling of incoming feature maps, and shows that these modifications consistently improve sound event classification in all cases considered, without adding any (or adding very few) trainable parameters, which makes them an appealing alternative to conventional pooling layers.
Improving Sound Event Classification by Increasing Shift Invariance in Convolutional Neural Networks
TLDR
This paper evaluates two pooling methods to improve shift invariance in CNNs, based on low-pass filtering and adaptive sampling of incoming feature maps, and shows that these modifications consistently improve sound event classification in all cases considered, without adding any (or adding very few) trainable parameters, which makes them an appealing alternative to conventional pooling layers.
FSD50K: An Open Dataset of Human-Labeled Sound Events
TLDR
FSD50K is introduced, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology, to provide an alternative benchmark dataset and thus foster SER research.
A Hybrid Parametric-Deep Learning Approach for Sound Event Localization and Detection
TLDR
The proposed methodology relies on parametric spatial audio analysis for source localization and detection, combined with a deep learning-based monophonic event classifier, to reduce the localization error on the evaluation dataset.
Self-Supervised Learning from Automatically Separated Sound Scenes
TLDR
This paper explores the use of unsupervised automatic sound separation to decompose unlabeled sound scenes into multiple semantically-linked views for use in self-supervised contrastive learning and finds that learning to associate input mixtures with their automatically separated outputs yields stronger representations than past approaches that use the mixtures alone.

References

SHOWING 1-10 OF 29 REFERENCES
Learning Sound Event Classifiers from Web Audio with Noisy Labels
TLDR
Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data, and it is shown that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.
Training deep neural-networks using a noise adaptation layer
TLDR
This study presents a neural-network approach that optimizes the same likelihood function as optimized by the EM algorithm but extended to the case where the noisy labels are dependent on the features in addition to the correct labels.
Audio tagging with noisy labels and minimal supervision
TLDR
This paper presents the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network.
Learning Sound Events From Webly Labeled Data
TLDR
This work introduces webly labeled learning for sound events which aims to remove human supervision altogether from the learning process, and develops a method of obtaining labeled audio data from the web, in which no manual labeling is involved.
Robust Loss Functions under Label Noise for Deep Neural Networks
TLDR
This paper provides some sufficient conditions on a loss function so that risk minimization under that loss function would be inherently tolerant to label noise for multiclass classification problems, and generalizes the existing results on noise-tolerant loss functions for binary classification.
Learning from Noisy Large-Scale Datasets with Minimal Supervision
TLDR
An approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations and is particularly effective for a large number of classes with wide range of noise in annotations.
Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach
TLDR
It is proved that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise, and it is shown how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and providing an end-to-end framework.
Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels
TLDR
A theoretically grounded set of noise-robust loss functions that can be seen as a generalization of MAE and CCE are presented and can be readily applied with any existing DNN architecture and algorithm, while yielding good performance in a wide range of noisy label scenarios.
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
TLDR
Experimental results demonstrate that the proposed novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet, can significantly improve the generalization performance of deep networks trained on corrupted training data.
...
...