Audio Tagging by Cross Filtering Noisy Labels
@article{Zhu2020AudioTB, title={Audio Tagging by Cross Filtering Noisy Labels}, author={Boqing Zhu and Kele Xu and Qiuqiang Kong and Huaimin Wang and Yuxing Peng}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, year={2020}, volume={28}, pages={2073-2083} }
High quality labeled datasets have allowed deep learning to achieve impressive results on many sound analysis tasks. Yet, it is labor-intensive to accurately annotate large amount of audio data, and the dataset may contain noisy labels in the practical settings. Meanwhile, the deep neural networks are susceptive to those incorrect labeled data because of their outstanding memorization ability. In this article, we present a novel framework, named CrossFilter, to combat the noisy labels problem…Â
Figures and Tables from this paper
6 Citations
ARCA23K: An audio dataset for investigating open-set label noise
- Computer ScienceDCASE
- 2021
It is shown that the majority of labelling errors in ARCA23K are due to out-of-vocabulary audio clips, and this type of label noise is referred to as open-set label noise.
Polyphonic training set synthesis improves self-supervised urban sound classification.
- Computer ScienceThe Journal of the Acoustical Society of America
- 2021
A two-stage approach to pre-train audio classifiers on a task whose ground truth is trivially available to benefit overall performance more than self-supervised learning and the geographical origin of the acoustic events in training set synthesis appears to have a decisive impact.
Audio Tagging Using CNN Based Audio Neural Networks for Massive Data Processing
- Computer ScienceDecember 2021
- 2021
A large-scale audio dataset is used for training a pre-trained audio neural network that outperforms the existing systems with a mean average of 0.45 and the performance of the proposed model is demonstrated by applying theaudio neural network to five specific audio pattern recognition tasks.
Self-Supervised Learning from Automatically Separated Sound Scenes
- Computer Science2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- 2021
This paper explores the use of unsupervised automatic sound separation to decompose unlabeled sound scenes into multiple semantically-linked views for use in self-supervised contrastive learning and finds that learning to associate input mixtures with their automatically separated outputs yields stronger representations than past approaches that use the mixtures alone.
Multimodal Deep Learning for Social Media Popularity Prediction With Attention Mechanism
- Computer ScienceACM Multimedia
- 2020
A novel multimodal deep learning framework for the popularity prediction task, which aims to leverage the complementary knowledge from different modalities, is proposed and results show that the proposed framework outperforms related approaches.
Multi-Scale Generalized Attention-Based Regional Maximum Activation of Convolutions for Beauty Product Retrieval
- Computer ScienceACM Multimedia
- 2020
This paper proposes a novel descriptors, named Multi-Scale Generalized Attention-Based Regional Maximum Activation of Convolutions (MS-GRMAC), which introduces multi-scale generalized attention mechanism to reduce the influence of scale variations, thus, can boost the performance of the retrieval task.
References
SHOWING 1-10 OF 52 REFERENCES
Audio tagging with noisy labels and minimal supervision
- Computer ScienceDCASE
- 2019
This paper presents the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network.
General-purpose audio tagging from noisy labels using convolutional neural networks
- Computer ScienceDCASE
- 2018
A system using an ensemble of convolutional neural networks trained on log-scaled mel spectrograms to address general-purpose audio tagging challenges and to reduce the effects of label noise is proposed.
Learning Sound Event Classifiers from Web Audio with Noisy Labels
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data, and it is shown that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.
Iterative Learning with Open-set Noisy Labels
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
A novel iterative learning framework for training CNNs on datasets with open-set noisy labels that detects noisy labels and learns deep discriminative features in an iterative fashion and designs a Siamese network to encourage clean labels and noisy labels to be dissimilar.
Label-efficient audio classification through multitask learning and self-supervision
- Computer ScienceArXiv
- 2019
This work trains an end-to-end audio feature extractor based on WaveNet that feeds into simple, yet versatile task-specific neural networks and describes several easily implemented self-supervised learning tasks that can operate on any large, unlabeled audio corpus.
Learning from Noisy Large-Scale Datasets with Minimal Supervision
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
An approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations and is particularly effective for a large number of classes with wide range of noise in annotations.
Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels
- Computer ScienceNeurIPS
- 2018
A theoretically grounded set of noise-robust loss functions that can be seen as a generalization of MAE and CCE are presented and can be readily applied with any existing DNN architecture and algorithm, while yielding good performance in a wide range of noisy label scenarios.
Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2017
A shrinking deep neural network (DNN) framework incorporating unsupervised feature learning to handle the multilabel classification task and a symmetric or asymmetric deep denoising auto-encoder (syDAE or asyDAE) to generate new data-driven features from the logarithmic Mel-filter banks features.
DCASE 2019 Task 2: Multitask Learning, Semi-supervised Learning and Model Ensemble with Noisy Data for Audio Tagging
- Computer ScienceDCASE
- 2019
This paper describes the approach to the DCASE 2019 challenge Task 2: Audio tagging with noisy labels and minimal supervision, a multi-label audio classification with 80 classes, and proposes three strategies, including multitask learning using noisy data and labels that are relabeled using trained models’ predictions.
Training Deep Neural Networks on Noisy Labels with Bootstrapping
- Computer ScienceICLR
- 2015
A generic way to handle noisy and incomplete labeling by augmenting the prediction objective with a notion of consistency is proposed, which considers a prediction consistent if the same prediction is made given similar percepts, where the notion of similarity is between deep network features computed from the input data.