• Corpus ID: 53974579

General-purpose audio tagging from noisy labels using convolutional neural networks

  title={General-purpose audio tagging from noisy labels using convolutional neural networks},
  author={Turab Iqbal and Qiuqiang Kong and Mark D. Plumbley and Wenwu Wang},
General-purpose audio tagging refers to classifying sounds that are of a diverse nature, and is relevant in many applications where domain-specific information cannot be exploited. [] Key Method The basis of our system is an ensemble of convolutional neural networks trained on log-scaled mel spectrograms. We use preprocessing and data augmentation methods to improve the performance further. To reduce the effects of label noise, two techniques are proposed: loss function weighting and pseudo-labeling…

Figures and Tables from this paper

Audio Tagging by Cross Filtering Noisy Labels
This article presents a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging, and achieves state-of-the-art performance and even surpasses the ensemble models on FSDKaggle2018 dataset.
Staged Training Strategy and Multi-Activation for Audio Tagging with Noisy and Sparse Multi-Label Data
This paper proposes a staged training strategy to deal with the noisy label, and adopts a sigmoid-sparsemax multi-activation structure toDeal with the sparse multi-label classification of audio tagging.
Audio Tagging System using Deep Learning Model 1950
The proposed work analyzes a large scale imbalanced audio data for a audio tagging system based on Convolutional Neural Network with Mel Frequency Cepstral Coefficients and shows the performance of proposed audio tagged system with an average mean precision.
Supervised Classifiers for Audio Impairments with Noisy Labels
It is demonstrated that CNN can generalize better on the training data with a large number of noisy labels and gives remarkably higher test performance.
Learning With Out-of-Distribution Data for Audio Classification
It is shown that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning, and an instance of labelling error for classification tasks in which the dataset is corrupted with out-of-distribution (OOD) instances is investigated.
Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning
This work proposes a method for generating sounds via neural discrete time-frequency representation learning, conditioned on sound classes, which offers an advantage in efficiently modelling long-range dependencies and retaining local fine-grained structures within sound clips.
Recipes for Post-training Quantization of Deep Neural Networks
An indepth analysis on different types of networks for audio, computer vision, medical and hand-held manufacturing tools use cases is presented; Each is compressed with fixed and adaptive quantization and fixed and variable bit width for the individual tensors.
Adversarial Attacks in Sound Event Classification
This paper applies different gradient based adversarial attack algorithms on five deep learning models trained for sound event classification to show that adversarial attacks can be generated with high confidence and low perturbation.
Data Augmentation Schemes for Deep Learning in an Indoor Positioning Application
The proposed schemes demonstrate the feasibility of data augmentation using a deep neural network (DNN)-based indoor localization system that lowers the complexity required for use on mobile devices.
A Study on the Transferability of Adversarial Attacks in Sound Event Classification
This work demonstrates differences in transferability properties from those observed in computer vision and shows that dataset normalization techniques such as z-score normalization does not affect the transferability of adversarial attacks and Techniques such as knowledge distillation do not increase the transferable of attacks.


Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly
Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network
This paper explores the use of Multi-channel CNN for the classification task, which aims to extract features from different channels in an end-to-end manner, and explores the using of mixup method, which can provide higher prediction accuracy and robustness in contrast with previous models.
CNN architectures for large-scale audio classification
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
Training Convolutional Networks with Noisy Labels
An extra noise layer is introduced into the network which adapts the network outputs to match the noisy label distribution and can be estimated as part of the training process and involve simple modifications to current training infrastructures for deep networks.
Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach
It is proved that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise, and it is shown how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and providing an end-to-end framework.
Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge
The emergence of deep learning as the most popular classification method is observed, replacing the traditional approaches based on Gaussian mixture models and support vector machines.
A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification
A novel multi-channel i-vector extraction and scoring scheme for ASC and a CNN architecture that achieves promising ASC results are proposed, and it is shown that i-vectors and CNNs capture complementary information from acoustic scenes.
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
It is shown that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation.
Environmental sound classification with convolutional neural networks
  • Karol J. Piczak
  • Computer Science
    2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)
  • 2015
The model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches.
Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks
This study supports the hypothesis that time-frequency representations are valuable in learning useful features for sound classification and observes that the optimal window size during transformation is dependent on the characteristics of the audio signal and architecturally, 2D convolution yielded better results in most cases compared to 1D.