Learning With Out-of-Distribution Data for Audio Classification

@article{Iqbal2020LearningWO,
  title={Learning With Out-of-Distribution Data for Audio Classification},
  author={Turab Iqbal and Yin Cao and Qiuqiang Kong and Mark D. Plumbley and Wenwu Wang},
  journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2020},
  pages={636-640}
}
  • Turab Iqbal, Yin Cao, Wenwu Wang
  • Published 24 January 2020
  • Computer Science
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
In supervised machine learning, the assumption that training data is labelled correctly is not always satisfied. In this paper, we investigate an instance of labelling error for classification tasks in which the dataset is corrupted with out-of-distribution (OOD) instances: data that does not belong to any of the target classes, but is labelled as such. We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning. The proposed… 
ARCA23K: An audio dataset for investigating open-set label noise
TLDR
It is shown that the majority of labelling errors in ARCA23K are due to out-of-vocabulary audio clips, and this type of label noise is referred to as open-set label noise.
SpliceOut: A Simple and Efficient Audio Augmentation Method
Time masking has become a de facto augmentation technique for speech and audio tasks, including automatic speech recognition (ASR) and audio classification, most notably as a part of SpecAugment. In
iDECODe: In-distribution Equivariance for Conformal Out-of-distribution Detection
TLDR
The new method iDECODe, leveraging in-distribution equivariance for conformal OOD detection, relies on a novel base non-conformity measure and a new aggregation method, used in the inductive conformal anomaly detection framework, thereby guaranteeing a bounded false detection rate.
FSD50K: An Open Dataset of Human-Labeled Sound Events
TLDR
FSD50K is introduced, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology, to provide an alternative benchmark dataset and thus foster SER research.

References

SHOWING 1-10 OF 23 REFERENCES
Training Deep Neural Networks on Noisy Labels with Bootstrapping
TLDR
A generic way to handle noisy and incomplete labeling by augmenting the prediction objective with a notion of consistency is proposed, which considers a prediction consistent if the same prediction is made given similar percepts, where the notion of similarity is between deep network features computed from the input data.
General-purpose audio tagging from noisy labels using convolutional neural networks
TLDR
A system using an ensemble of convolutional neural networks trained on log-scaled mel spectrograms to address general-purpose audio tagging challenges and to reduce the effects of label noise is proposed.
Training Convolutional Networks with Noisy Labels
TLDR
An extra noise layer is introduced into the network which adapts the network outputs to match the noisy label distribution and can be estimated as part of the training process and involve simple modifications to current training infrastructures for deep networks.
Audio Tagging using Linear Noise Modelling Layer
TLDR
Results show that modelling the noise distribution improves the accuracy of the baseline network in a similar capacity to the soft bootstrapping loss.
Learning Sound Event Classifiers from Web Audio with Noisy Labels
TLDR
Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data, and it is shown that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.
Learning Confidence for Out-of-Distribution Detection in Neural Networks
TLDR
This work proposes a method of learning confidence estimates for neural networks that is simple to implement and produces intuitively interpretable outputs, and addresses the problem of calibrating out-of-distribution detectors.
Learning from Noisy Labels with Distillation
TLDR
This work proposes a unified distillation framework to use “side” information, including a small clean dataset and label relations in knowledge graph, to “hedge the risk” of learning from noisy labels, and proposes a suite of new benchmark datasets to evaluate this task in Sports, Species and Artifacts domains.
Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks
TLDR
The proposed ODIN method, based on the observation that using temperature scaling and adding small perturbations to the input can separate the softmax score distributions between in- and out-of-distribution images, allowing for more effective detection, consistently outperforms the baseline approach by a large margin.
Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels
TLDR
A theoretically grounded set of noise-robust loss functions that can be seen as a generalization of MAE and CCE are presented and can be readily applied with any existing DNN architecture and algorithm, while yielding good performance in a wide range of noisy label scenarios.
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
TLDR
A simple baseline that utilizes probabilities from softmax distributions is presented, showing the effectiveness of this baseline across all computer vision, natural language processing, and automatic speech recognition, and it is shown the baseline can sometimes be surpassed.
...
1
2
3
...