General-purpose audio tagging from noisy labels using convolutional neural networks
@inproceedings{Iqbal2018GeneralpurposeAT, title={General-purpose audio tagging from noisy labels using convolutional neural networks}, author={Turab Iqbal and Qiuqiang Kong and Mark D. Plumbley and Wenwu Wang}, booktitle={DCASE}, year={2018} }
General-purpose audio tagging refers to classifying sounds that are
of a diverse nature, and is relevant in many applications where
domain-specific information cannot be exploited. [] Key Method The basis of our system is
an ensemble of convolutional neural networks trained on log-scaled
mel spectrograms. We use preprocessing and data augmentation
methods to improve the performance further. To reduce the effects
of label noise, two techniques are proposed: loss function weighting
and pseudo-labeling…
11 Citations
Audio Tagging by Cross Filtering Noisy Labels
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2020
This article presents a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging, and achieves state-of-the-art performance and even surpasses the ensemble models on FSDKaggle2018 dataset.
Staged Training Strategy and Multi-Activation for Audio Tagging with Noisy and Sparse Multi-Label Data
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
This paper proposes a staged training strategy to deal with the noisy label, and adopts a sigmoid-sparsemax multi-activation structure toDeal with the sparse multi-label classification of audio tagging.
Audio Tagging System using Deep Learning Model 1950
- Computer Science
- 2019
The proposed work analyzes a large scale imbalanced audio data for a audio tagging system based on Convolutional Neural Network with Mel Frequency Cepstral Coefficients and shows the performance of proposed audio tagged system with an average mean precision.
Supervised Classifiers for Audio Impairments with Noisy Labels
- Computer ScienceINTERSPEECH
- 2019
It is demonstrated that CNN can generalize better on the training data with a large number of noisy labels and gives remarkably higher test performance.
Learning With Out-of-Distribution Data for Audio Classification
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
It is shown that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning, and an instance of labelling error for classification tasks in which the dataset is corrupted with out-of-distribution (OOD) instances is investigated.
Recipes for Post-training Quantization of Deep Neural Networks
- Computer Science
- 2020
An indepth analysis on different types of networks for audio, computer vision, medical and hand-held manufacturing tools use cases is presented; Each is compressed with fixed and adaptive quantization and fixed and variable bit width for the individual tensors.
Adversarial Attacks in Sound Event Classification
- Computer ScienceArXiv
- 2019
This paper applies different gradient based adversarial attack algorithms on five deep learning models trained for sound event classification to show that adversarial attacks can be generated with high confidence and low perturbation.
Data Augmentation Schemes for Deep Learning in an Indoor Positioning Application
- Computer ScienceElectronics
- 2019
The proposed schemes demonstrate the feasibility of data augmentation using a deep neural network (DNN)-based indoor localization system that lowers the complexity required for use on mobile devices.
A Study on the Transferability of Adversarial Attacks in Sound Event Classification
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
This work demonstrates differences in transferability properties from those observed in computer vision and shows that dataset normalization techniques such as z-score normalization does not affect the transferability of adversarial attacks and Techniques such as knowledge distillation do not increase the transferable of attacks.
Robustness of Adversarial Attacks in Sound Event Classification
- Computer Science, MathematicsDCASE
- 2019
This paper investigates the robustness of adversarial examples to simple input transformations such as mp3 compression, resampling, white noise and reverb in the task of sound event classification to provide insights on strengths and weaknesses in current adversarial attack algorithms and provide a baseline for defenses against adversarial attacks.
References
SHOWING 1-10 OF 36 REFERENCES
Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly…
Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network
- Computer SciencePCM
- 2018
This paper explores the use of Multi-channel CNN for the classification task, which aims to extract features from different channels in an end-to-end manner, and explores the using of mixup method, which can provide higher prediction accuracy and robustness in contrast with previous models.
CNN architectures for large-scale audio classification
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
Training Convolutional Networks with Noisy Labels
- Computer ScienceICLR 2014
- 2014
An extra noise layer is introduced into the network which adapts the network outputs to match the noisy label distribution and can be estimated as part of the training process and involve simple modifications to current training infrastructures for deep networks.
Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
It is proved that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise, and it is shown how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and providing an end-to-end framework.
Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2018
The emergence of deep learning as the most popular classification method is observed, replacing the traditional approaches based on Gaussian mixture models and support vector machines.
A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification
- Computer Science2017 25th European Signal Processing Conference (EUSIPCO)
- 2017
A novel multi-channel i-vector extraction and scoring scheme for ASC and a CNN architecture that achieves promising ASC results are proposed, and it is shown that i-vectors and CNNs capture complementary information from acoustic scenes.
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
- Computer ScienceIEEE Signal Processing Letters
- 2017
It is shown that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation.
Environmental sound classification with convolutional neural networks
- Computer Science2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)
- 2015
The model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches.
Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks
- Computer ScienceArXiv
- 2017
This study supports the hypothesis that time-frequency representations are valuable in learning useful features for sound classification and observes that the optimal window size during transformation is dependent on the characteristics of the audio signal and architecturally, 2D convolution yielded better results in most cases compared to 1D.