Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking
@article{Fonseca2020AddressingML, title={Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking}, author={Eduardo Fonseca and Shawn Hershey and Manoj Plakal and Daniel P. W. Ellis and Aren Jansen and R. Channing Moore}, journal={IEEE Signal Processing Letters}, year={2020}, volume={27}, pages={1235-1239} }
The study of label noise in sound event recognition has recently gained attention with the advent of larger and noisier datasets. This work addresses the problem of missing labels, one of the big weaknesses of large audio datasets, and one of the most conspicuous issues for AudioSet. We propose a simple and model-agnostic method based on a teacher-student framework with loss masking to first identify the most critical missing label candidates, and then ignore their contribution during the…
16 Citations
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
- Computer Science, PhysicsICASSP
- 2022
A VocalSound dataset consisting of over 21,000 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs from 3,365 unique subjects is created to support research on building robust and accurate vocal sound recognition.
Semi-Supervised Audio Classification with Partially Labeled Data
- Computer Science2021 IEEE International Symposium on Multimedia (ISM)
- 2021
This paper presents two semi-supervised methods capable of learning with missing labels and evaluates them on two publicly available, partially labeled datasets.
FSD50K: An Open Dataset of Human-Labeled Sound Events
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2022
FSD50K is introduced, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology, to provide an alternative benchmark dataset and thus foster SER research.
PSLA: Improving Audio Event Classification with Pretraining, Sampling, Labeling, and Aggregation
- Computer ScienceArXiv
- 2021
PSLA is presented, a collection of training techniques that can noticeably boost the model accuracy including ImageNet pretraining, balanced sampling, data augmentation, label enhancement, model aggregation and their design choices that achieves a new state-of-the-art mean average precision on AudioSet.
PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2021
PSLA is presented, a collection of model agnostic training techniques that can noticeably boost the model accuracy including ImageNet pretraining, balanced sampling, data augmentation, label enhancement, model aggregation, and model aggregation.
Symptom Identification for Interpretable Detection of Multiple Mental Disorders
- PsychologyArXiv
- 2022
Mental disease detection (MDD) from social media has suffered from poor generalizability and interpretability, due to lack of symptom modeling. This paper introduces PsySym , the first annotated…
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
- Computer ScienceArXiv
- 2022
An intriguing interaction is found between the two very different models CNN and AST models are good teachers for each other and when either of them is used as the teacher and the other model is trained as the student via knowledge distillation, the performance of the student model noticeably improves, and in many cases, is better than the teacher model.
Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging
- Computer ScienceCIKM
- 2021
This work investigates robust audio tagging models in low-resource scenarios with the enhancement of knowledge graphs and proposes a semi-automatic approach that can construct temporal knowledge graphs on diverse domain-specific label sets.
Sound Event Detection: A tutorial
- ArtIEEE Signal Processing Magazine
- 2021
Imagine standing on a street corner in the city. With your eyes closed you can hear and recognize a succession of sounds: cars passing by, people speaking, their footsteps when they walk by, and the…
J ul 2 02 1 IMPROVING SOUND EVENT CLASSIFICATION BY INCREASING SHIFT INVARIANCE IN CONVOLUTIONAL NEURAL NETWORKS
- Computer Science
- 2021
This paper evaluates two pooling methods to improve shift invariance in CNNs, based on low-pass filtering and adaptive sampling of incoming feature maps, and shows that these modifications consistently improve sound event classification in all cases considered, without adding any (or adding very few) trainable parameters, which makes them an appealing alternative to conventional pooling layers.
References
SHOWING 1-10 OF 30 REFERENCES
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- Computer ScienceArXiv
- 2017
This work introduces two simple global hyper-parameters that efficiently trade off between latency and accuracy and demonstrates the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.
Audio Set: An ontology and human-labeled dataset for audio events
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2020
This paper proposes pretrained audio neural networks (PANNs) trained on the large-scale AudioSet dataset, and investigates the performance and computational complexity of PANNs modeled by a variety of convolutional neural networks.
SeCoST: Sequential Co-Supervision for Weakly Labeled Audio Event Detection
- Computer ScienceArXiv
- 2019
Confident Learning: Estimating Uncertainty in Dataset Labels
- Computer ScienceJ. Artif. Intell. Res.
- 2021
This work combines building on the assumption of a classification noise process to directly estimate the joint distribution between noisy (given) labels and uncorrupted (unknown) labels, resulting in a generalized CL which is provably consistent and experimentally performant.
The Impact of Missing Labels and Overlapping Sound Events on Multi-label Multi-instance Learning for Sound Event Classification
- Computer ScienceDCASE
- 2019
This paper investigates two state-of-theart methodologies that allow this type of learning, low-resolution multi-label non-negative matrix deconvolution (LRM-NMD) and CNN and shows good robustness to missing labels.
Model-Agnostic Approaches To Handling Noisy Labels When Training Sound Event Classifiers
- Computer Science2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- 2019
This work evaluates simple and efficient model-agnostic approaches to handling noisy labels when training sound event classifiers, namely label smoothing regularization, mixup and noise-robust loss functions, which can be easily incorporated to existing deep learning pipelines without need for network modifications or extra resources.
A Deep Residual Network for Large-Scale Acoustic Scene Analysis
- Computer ScienceINTERSPEECH
- 2019
The task of training a multi-label event classifier directly from the audio recordings of AudioSet is studied and it is found that the models are able to localize audio events when a finer time resolution is needed.
Sound Event Detection Using Point-Labeled Data
- Computer Science2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- 2019
This work illustrates methods to train a SED model on point-labeled data and shows that a model trained on point labeled audio data significantly outperforms weak models and is comparable to a modeltrained on strongly labeled data.
Audio tagging with noisy labels and minimal supervision
- Computer ScienceDCASE
- 2019
This paper presents the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network.