Audio Tagging using Linear Noise Modelling Layer

  title={Audio Tagging using Linear Noise Modelling Layer},
  author={Sonal Singh and Arjun Pankajakshan and Emmanouil Benetos and Events},
Label noise refers to the presence of inaccurate target labels in a dataset. It is an impediment to the performance of a deep neural network (DNN) as the network tends to overfit to the label noise, hence it becomes imperative to devise a generic methodology to counter the effects of label noise. FSDnoisy18k is an audio dataset collected with the aim of encouraging research on label noise for sound event classification. The dataset contains∼42.5 hours of audio recordings divided across 20… 
2 Citations

Figures and Tables from this paper

Learning With Out-of-Distribution Data for Audio Classification
It is shown that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning, and an instance of labelling error for classification tasks in which the dataset is corrupted with out-of-distribution (OOD) instances is investigated.
Low-Cost Distributed Acoustic Sensor Network for Real-Time Urban Sound Monitoring
A highly scalable low-cost distributed infrastructure that features a ubiquitous acoustic sensor network to monitor urban sounds and enables practitioners to acoustically populate urban spaces and provide a reliable view of noises occurring in real time is presented.


Learning Sound Event Classifiers from Web Audio with Noisy Labels
Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data, and it is shown that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.
Training deep neural-networks using a noise adaptation layer
This study presents a neural-network approach that optimizes the same likelihood function as optimized by the EM algorithm but extended to the case where the noisy labels are dependent on the features in addition to the correct labels.
Training Deep Neural Networks on Noisy Labels with Bootstrapping
A generic way to handle noisy and incomplete labeling by augmenting the prediction objective with a notion of consistency is proposed, which considers a prediction consistent if the same prediction is made given similar percepts, where the notion of similarity is between deep network features computed from the input data.
Learning from Noisy Large-Scale Datasets with Minimal Supervision
An approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations and is particularly effective for a large number of classes with wide range of noise in annotations.
CNN architectures for large-scale audio classification
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
Audio Set: An ontology and human-labeled dataset for audio events
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline
The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.
A Closer Look at Memorization in Deep Networks
The analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.
mixup: Beyond Empirical Risk Minimization
This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures.