• Corpus ID: 96427854

Training general-purpose audio tagging networks with noisy labels and iterative self-verification

@inproceedings{Dorfer2018TrainingGA,
  title={Training general-purpose audio tagging networks with noisy labels and iterative self-verification},
  author={Matthias Dorfer and Gerhard Widmer},
  booktitle={DCASE},
  year={2018}
}
This paper describes our submission to the first Freesound generalpurpose audio tagging challenge carried out within the DCASE 2018 challenge. Our proposal is based on a fully convolutional neural network that predicts one out of 41 possible audio class labels when given an audio spectrogram excerpt as an input. What makes this classification dataset and the task in general special, is the fact that only 3,700 of the 9,500 provided training examples are delivered with manually verified ground… 

Figures and Tables from this paper

AUDIO TAGGING WITH CONVOLUTIONAL NEURAL NETWORKS TRAINED WITH NOISY DATA Technical Report
TLDR
An ensemble that provides us with the likelihood of 80 different labels being present in an input audio clip is obtained by averaging over the predictions of all five networks, and reaches a Label Weighted Label Ranking Average Precision of 0.722.
Semi-supervised audio tagging with deep co-training and augmentations
  • Computer Science
  • 2020
TLDR
This work proposes to artificially increase the 10% of labeled files by simply duplicating them in the mini-batches during learning, and transforming them with audio data augmentations, and reports experiments on the publicly available UrbanSound8K dataset.
Staged Training Strategy and Multi-Activation for Audio Tagging with Noisy and Sparse Multi-Label Data
TLDR
This paper proposes a staged training strategy to deal with the noisy label, and adopts a sigmoid-sparsemax multi-activation structure toDeal with the sparse multi-label classification of audio tagging.
Audio Tagging System using Deep Learning Model 1950
TLDR
The proposed work analyzes a large scale imbalanced audio data for a audio tagging system based on Convolutional Neural Network with Mel Frequency Cepstral Coefficients and shows the performance of proposed audio tagged system with an average mean precision.
Weakly supervised CRNN system for sound event detection with large-scale unlabeled in-domain data
TLDR
A state-of-the-art general audio tagging model is first employed to predict weak labels for unlabeled data, and a weakly supervised architecture based on the convolutional recurrent neural network is developed to solve the strong annotations of sound events with the aid of the unlabeling data with predicted labels.
An End-to-End Audio Classification System based on Raw Waveforms and Mix-Training Strategy
TLDR
An end-to-end audio classification system based on raw waveforms and a mix-training strategy to break the performance limitation caused by the amount of training data and exceeds the state-of-the-art multi-level attention model.
Learning Sound Event Classifiers from Web Audio with Noisy Labels
TLDR
Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data, and it is shown that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.
Semi-supervised Triplet Loss Based Learning of Ambient Audio Embeddings
TLDR
This paper combines unsupervised and supervised triplet loss based learning into a semi-supervised representation learning approach, whereby the positive samples for those triplets whose anchors are unlabeled are obtained either by applying a transformation to the anchor, or by selecting the nearest sample in the training set.
LOW-COMPLEXITY ACOUSTIC SCENE CLASSIFICATION USING ONE-BIT-PER-WEIGHT DEEP CONVOLUTIONAL NEURAL NETWORKS Technical Report
TLDR
This technical report describes a submission to Task 1b (“LowComplexity Acoustic Scene Classification”) in the DCASE2020 Ac acoustic Scene Challenge, which allowed a single 36layer all-convolutional deep neural network to be trained, consisting of a total of 3,987,000 binary weights, totalling 486.69KB.
Receptive-field-regularized CNN variants for acoustic scene classification
TLDR
This paper performs a systematic investigation of different RF configuration for various CNN architectures on the DCASE 2019 Task 1.A dataset, introduces Frequency Aware CNNs to compensate for the lack of frequency information caused by the restricted RF, and investigates if and in what RF ranges they yield additional improvement.
...
1
2
3
...

References

SHOWING 1-10 OF 24 REFERENCES
Training Deep Neural Networks on Noisy Labels with Bootstrapping
TLDR
A generic way to handle noisy and incomplete labeling by augmenting the prediction objective with a notion of consistency is proposed, which considers a prediction consistent if the same prediction is made given similar percepts, where the notion of similarity is between deep network features computed from the input data.
Convolutional gated recurrent neural network incorporating spatial features for audio tagging
TLDR
This paper proposes to use a convolutional neural network (CNN) to extract robust features from mel-filter banks, spectrograms or even raw waveforms for audio tagging to evaluate the proposed methods on Task 4 of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge.
Learning from Noisy Labels with Deep Neural Networks
TLDR
A novel way of modifying deep learning models so they can be effectively trained on data with high level of label noise is proposed, and it is shown that random images without labels can improve the classification performance.
CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS
TLDR
This report describes the 4 submissions for Task 1 (Audio scene classification) of the DCASE-2016 challenge of the CP-JKU team and proposes a novel i-vector extraction scheme for ASC using both left and right audio channels and a Deep Convolutional Neural Network architecture trained on spectrograms of audio excerpts in end-to-end fashion.
CNN architectures for large-scale audio classification
TLDR
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
TLDR
It is shown that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation.
MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
TLDR
Experimental results demonstrate that the proposed novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet, can significantly improve the generalization performance of deep networks trained on corrupted training data.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Robust Active Label Correction
TLDR
An image classification experiment using convolutional neural networks demonstrates that the class-conditional noise model, which can be learned efficiently, can guide re-labeling in real-world applications.
A multi-device dataset for urban acoustic scene classification
TLDR
The acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task are introduced, and the performance of a baseline system in the task is evaluated.
...
1
2
3
...