A Closer Look at Weak Label Learning for Audio Events
@article{Shah2018ACL, title={A Closer Look at Weak Label Learning for Audio Events}, author={Ankit Shah and Anurag Kumar and Alexander Hauptmann and Bhiksha Raj}, journal={ArXiv}, year={2018}, volume={abs/1804.09288} }
Audio content analysis in terms of sound events is an important research problem for a variety of applications. Recently, the development of weak labeling approaches for audio or sound event detection (AED) and availability of large scale weakly labeled dataset have finally opened up the possibility of large scale AED. However, a deeper understanding of how weak labels affect the learning for sound events is still missing from literature. In this work, we first describe a CNN based approach for…
Figures and Tables from this paper
46 Citations
Learning Sound Event Classifiers from Web Audio with Noisy Labels
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data, and it is shown that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.
Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking
- Computer ScienceIEEE Signal Processing Letters
- 2020
This work proposes a simple and model-agnostic method based on a teacher-student framework with loss masking to first identify the most critical missing label candidates, and then ignore their contribution during the learning process, finding that a simple optimisation of the training label set improves recognition performance without additional computation.
Self-supervised Attention Model for Weakly Labeled Audio Event Classification
- Computer Science2019 27th European Signal Processing Conference (EUSIPCO)
- 2019
A novel weakly labeled Audio Event Classification approach based on a self-supervised attention model that achieves 8.8% and 17.6% relative mean average precision improvements over the current state-of-the-art systems for SL-DCASE-17and balanced AudioSet.
SeCoST:: Sequential Co-Supervision for Large Scale Weakly Labeled Audio Event Detection
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
A new framework for designing learning models with weak supervision by bridging ideas from sequential learning and knowledge distillation is proposed, referred to as SeCoST (pronounced Sequest) — Sequential Co-supervision for training generations of Students.
Teacher-student Training for Acoustic Event Detection Using Audioset
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
This paper investigates a teacher-student training approach of learning low-complexity student models, using large teachers and describes a framework that enables learning arbitrary small-footprint, generic or domain-expert, AED systems from generic teachers.
The Impact of Missing Labels and Overlapping Sound Events on Multi-label Multi-instance Learning for Sound Event Classification
- Computer ScienceDCASE
- 2019
This paper investigates two state-of-theart methodologies that allow this type of learning, low-resolution multi-label non-negative matrix deconvolution (LRM-NMD) and CNN and shows good robustness to missing labels.
Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection
- Computer ScienceDigit. Signal Process.
- 2022
Polyphonic Sound Event Detection with Weak Labeling
- Computer Science
- 2017
This thesis proposes to train deep learning models for SED using various levels of weak labeling, and shows that the sound events can be learned and localized by a recurrent neural network (RNN) with a connectionist temporal classification (CTC) output layer, which is well suited for sequential supervision.
Power Pooling: An Adaptive Pooling Function for Weakly Labelled Sound Event Detection
- Computer Science2021 International Joint Conference on Neural Networks (IJCNN)
- 2021
An adaptive power pooling function which can automatically adapt to various sound sources and outperforms the state-of-the-art linear softmax pooling on both coarsegrained and fine-grained metrics is proposed.
Limitations of Weak Labels for Embedding and Tagging
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
This paper creates a dataset that focuses on the difference between strong and weak labels as opposed to other challenges, and investigates the impact of weak labels when training an embedding or an end-to-end classifier.
References
SHOWING 1-10 OF 49 REFERENCES
Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data
- Computer ScienceArXiv
- 2017
A robust and efficient deep convolutional neural network (CNN) based framework to learn audio event recognizers from weakly labeled data that can train from and analyze recordings of variable length in an efficient manner and outperforms a network trained with {\em strongly labeled} web data by a considerable margin.
Audio Event Detection using Weakly Labeled Data
- Computer ScienceACM Multimedia
- 2016
It is shown that audio event detection using weak labels can be formulated as an Multiple Instance Learning problem and two frameworks for solving multiple-instance learning are suggested, one based on support vector machines, and the other on neural networks.
Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This work describes a convolutional neural network (CNN) based framework for sound event detection and classification using weakly labeled audio data and proposes methods to learn representations using this model which can be effectively used for solving the target task.
Audio event and scene recognition: A unified approach using strongly and weakly labeled data
- Computer Science2017 International Joint Conference on Neural Networks (IJCNN)
- 2017
The main method is based on manifold regularization on graphs in which it is shown that the unified learning can be formulated as a constraint optimization problem which can be solved by iterative concave-convex procedure (CCCP).
Attention and Localization Based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging
- Computer ScienceINTERSPEECH
- 2017
A weakly supervised method to not only predict the tags but also indicate the temporal locations of the occurred acoustic events and the attention scheme is found to be effective in identifying the important frames while ignoring the unrelated frames.
Weakly-supervised audio event detection using event-specific Gaussian filters and fully convolutional networks
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
A model based on convolutional neural networks that relies only on weakly-supervised data for training and is able to detect frame-level information, e.g., the temporal position of sounds, even when it is trained merely with clip-level labels.
Audio Set: An ontology and human-labeled dataset for audio events
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Bag-of-Audio-Words Approach for Multimedia Event Classification
- Computer ScienceINTERSPEECH
- 2012
Variations of the BoAW method are explored and results on NIST 2011 multimedia event detection (MED) dataset are presented.
Weakly supervised scalable audio content analysis
- Computer Science2016 IEEE International Conference on Multimedia and Expo (ICME)
- 2016
A weakly supervised learning framework which can make use of the tremendous amount of web multimedia data with significantly reduced annotation effort and expense is proposed and several multiple instance learning algorithms are used to show that audio event detection through weak labels is feasible.
Reducing Model Complexity for DNN Based Large-Scale Audio Classification
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This paper proposes two different strategies that aim at constructing low-dimensional embedding feature extractors and hence reducing the number of model parameters in the CNN model, shown that the simplified CNN model has only 1/22 model parameters of the original model, with only a slight degradation of performance.