Learning Sound Event Classifiers from Web Audio with Noisy Labels

  title={Learning Sound Event Classifiers from Web Audio with Noisy Labels},
  author={Eduardo Fonseca and Manoj Plakal and Daniel P. W. Ellis and Frederic Font and Xavier Favory and Xavier Serra},
  journal={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  • Eduardo Fonseca, M. Plakal, X. Serra
  • Published 4 January 2019
  • Computer Science
  • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
As sound event classification moves towards larger datasets, issues of label noise become inevitable. Web sites can supply large volumes of user-contributed audio and metadata, but inferring labels from this metadata introduces errors due to unreliable inputs, and limitations in the mapping. There is, however, little research into the impact of these errors. To foster the investigation of label noise in sound event classification we present FSDnoisy18k, a dataset containing 42.5 hours of audio… 

Figures and Tables from this paper

Model-Agnostic Approaches To Handling Noisy Labels When Training Sound Event Classifiers
This work evaluates simple and efficient model-agnostic approaches to handling noisy labels when training sound event classifiers, namely label smoothing regularization, mixup and noise-robust loss functions, which can be easily incorporated to existing deep learning pipelines without need for network modifications or extra resources.
Audio Tagging by Cross Filtering Noisy Labels
This article presents a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging, and achieves state-of-the-art performance and even surpasses the ensemble models on FSDKaggle2018 dataset.
Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking
This work proposes a simple and model-agnostic method based on a teacher-student framework with loss masking to first identify the most critical missing label candidates, and then ignore their contribution during the learning process, finding that a simple optimisation of the training label set improves recognition performance without additional computation.
Audio Tagging using Linear Noise Modelling Layer
Results show that modelling the noise distribution improves the accuracy of the baseline network in a similar capacity to the soft bootstrapping loss.
Detection and Classification of Acoustic Scenes and Events 2019 Challenge MULTI-LABEL AUDIO TAGGING WITH NOISY LABELS AND VARIABLE LENGTH Technical Report
This paper proposes a data generation method named Dominate Mixup which can restrain the impact of incorrect label during back propagation and it’s suitable for multi-class classification problem.
The Benefit of Temporally-Strong Labels in Audio Event Classification
  • Shawn Hershey, D. Ellis, M. Plakal
  • Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
It is shown that fine-tuning with a mix of weak- and strongly-labeled data can substantially improve classifier performance, even when evaluated using only the original weak labels.
Audio tagging with noisy labels and minimal supervision
This paper presents the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network.
The Impact of Missing Labels and Overlapping Sound Events on Multi-label Multi-instance Learning for Sound Event Classification
This paper investigates two state-of-theart methodologies that allow this type of learning, low-resolution multi-label non-negative matrix deconvolution (LRM-NMD) and CNN and shows good robustness to missing labels.
Supervised Classifiers for Audio Impairments with Noisy Labels
It is demonstrated that CNN can generalize better on the training data with a large number of noisy labels and gives remarkably higher test performance.
ARCA23K: An audio dataset for investigating open-set label noise
It is shown that the majority of labelling errors in ARCA23K are due to out-of-vocabulary audio clips, and this type of label noise is referred to as open-set label noise.


A Closer Look at Weak Label Learning for Audio Events
This work describes a CNN based approach for weakly supervised training of audio events and describes important characteristics, which naturally arise inweakly supervised learning of sound events, and shows how these aspects of weak labels affect the generalization of models.
DCASE 2018 task 2: iterative training, label smoothing, and background noise normalization for audio event tagging
This paper describes an approach from the submissions for DCASE 2018 Task 2: general-purpose audio tagging of Freesound content with AudioSet labels, and proposes to use pseudolabel for automatic label verification and label smoothing to reduce the over-fitting.
Audio Set: An ontology and human-labeled dataset for audio events
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Data-efficient weakly supervised learning for low-resource audio event detection using deep learning
A data-efficient training of a stacked convolutional and recurrent neural network is proposed in a multi instance learning setting for which a new loss function is introduced that leads to improved training compared to the usual approaches for weakly supervised learning.
Iterative Learning with Open-set Noisy Labels
A novel iterative learning framework for training CNNs on datasets with open-set noisy labels that detects noisy labels and learns deep discriminative features in an iterative fashion and designs a Siamese network to encourage clean labels and noisy labels to be dissimilar.
Training general-purpose audio tagging networks with noisy labels and iterative self-verification
This paper describes our submission to the first Freesound generalpurpose audio tagging challenge carried out within the DCASE 2018 challenge. Our proposal is based on a fully convolutional neural
Joint Optimization Framework for Learning with Noisy Labels
This work proposes a joint optimization framework of learning DNN parameters and estimating true labels that can correct labels during training by alternating update of network parameters and labels.
Semi-supervised learning helps in sound event classification
  • Zixing Zhang, Björn Schuller
  • Computer Science
    2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2012
Adding unlabelled sound event data to the training set based on sufficient classifier confidence level after its automatic labelling level can significantly enhance classification performance, and combined with optimal re-sampling of originally labelled instances and iteratively learning in semi-supervised manner can reach approximately half the one achieved by using the originally manually labelled data.
Learning from Noisy Large-Scale Datasets with Minimal Supervision
An approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations and is particularly effective for a large number of classes with wide range of noise in annotations.
Training Deep Neural Networks on Noisy Labels with Bootstrapping
A generic way to handle noisy and incomplete labeling by augmenting the prediction objective with a notion of consistency is proposed, which considers a prediction consistent if the same prediction is made given similar percepts, where the notion of similarity is between deep network features computed from the input data.