• Corpus ID: 13742708

A Closer Look at Weak Label Learning for Audio Events

  title={A Closer Look at Weak Label Learning for Audio Events},
  author={Ankit Shah and Anurag Kumar and Alexander Hauptmann and Bhiksha Raj},
Audio content analysis in terms of sound events is an important research problem for a variety of applications. Recently, the development of weak labeling approaches for audio or sound event detection (AED) and availability of large scale weakly labeled dataset have finally opened up the possibility of large scale AED. However, a deeper understanding of how weak labels affect the learning for sound events is still missing from literature. In this work, we first describe a CNN based approach for… 

Figures and Tables from this paper

Learning Sound Event Classifiers from Web Audio with Noisy Labels
Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data, and it is shown that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.
Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking
This work proposes a simple and model-agnostic method based on a teacher-student framework with loss masking to first identify the most critical missing label candidates, and then ignore their contribution during the learning process, finding that a simple optimisation of the training label set improves recognition performance without additional computation.
Self-supervised Attention Model for Weakly Labeled Audio Event Classification
A novel weakly labeled Audio Event Classification approach based on a self-supervised attention model that achieves 8.8% and 17.6% relative mean average precision improvements over the current state-of-the-art systems for SL-DCASE-17and balanced AudioSet.
SeCoST:: Sequential Co-Supervision for Large Scale Weakly Labeled Audio Event Detection
  • Anurag Kumar, V. Ithapu
  • Computer Science
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
A new framework for designing learning models with weak supervision by bridging ideas from sequential learning and knowledge distillation is proposed, referred to as SeCoST (pronounced Sequest) — Sequential Co-supervision for training generations of Students.
Teacher-student Training for Acoustic Event Detection Using Audioset
This paper investigates a teacher-student training approach of learning low-complexity student models, using large teachers and describes a framework that enables learning arbitrary small-footprint, generic or domain-expert, AED systems from generic teachers.
The Impact of Missing Labels and Overlapping Sound Events on Multi-label Multi-instance Learning for Sound Event Classification
This paper investigates two state-of-theart methodologies that allow this type of learning, low-resolution multi-label non-negative matrix deconvolution (LRM-NMD) and CNN and shows good robustness to missing labels.
Polyphonic Sound Event Detection with Weak Labeling
This thesis proposes to train deep learning models for SED using various levels of weak labeling, and shows that the sound events can be learned and localized by a recurrent neural network (RNN) with a connectionist temporal classification (CTC) output layer, which is well suited for sequential supervision.
Power Pooling: An Adaptive Pooling Function for Weakly Labelled Sound Event Detection
An adaptive power pooling function which can automatically adapt to various sound sources and outperforms the state-of-the-art linear softmax pooling on both coarsegrained and fine-grained metrics is proposed.
Limitations of Weak Labels for Embedding and Tagging
This paper creates a dataset that focuses on the difference between strong and weak labels as opposed to other challenges, and investigates the impact of weak labels when training an embedding or an end-to-end classifier.


Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data
A robust and efficient deep convolutional neural network (CNN) based framework to learn audio event recognizers from weakly labeled data that can train from and analyze recordings of variable length in an efficient manner and outperforms a network trained with {\em strongly labeled} web data by a considerable margin.
Audio Event Detection using Weakly Labeled Data
It is shown that audio event detection using weak labels can be formulated as an Multiple Instance Learning problem and two frameworks for solving multiple-instance learning are suggested, one based on support vector machines, and the other on neural networks.
Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes
This work describes a convolutional neural network (CNN) based framework for sound event detection and classification using weakly labeled audio data and proposes methods to learn representations using this model which can be effectively used for solving the target task.
Audio event and scene recognition: A unified approach using strongly and weakly labeled data
  • B. Raj, Anurag Kumar
  • Computer Science
    2017 International Joint Conference on Neural Networks (IJCNN)
  • 2017
The main method is based on manifold regularization on graphs in which it is shown that the unified learning can be formulated as a constraint optimization problem which can be solved by iterative concave-convex procedure (CCCP).
Attention and Localization Based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging
A weakly supervised method to not only predict the tags but also indicate the temporal locations of the occurred acoustic events and the attention scheme is found to be effective in identifying the important frames while ignoring the unrelated frames.
Weakly-supervised audio event detection using event-specific Gaussian filters and fully convolutional networks
A model based on convolutional neural networks that relies only on weakly-supervised data for training and is able to detect frame-level information, e.g., the temporal position of sounds, even when it is trained merely with clip-level labels.
Audio Set: An ontology and human-labeled dataset for audio events
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Bag-of-Audio-Words Approach for Multimedia Event Classification
Variations of the BoAW method are explored and results on NIST 2011 multimedia event detection (MED) dataset are presented.
Weakly supervised scalable audio content analysis
  • Anurag Kumar, B. Raj
  • Computer Science
    2016 IEEE International Conference on Multimedia and Expo (ICME)
  • 2016
A weakly supervised learning framework which can make use of the tremendous amount of web multimedia data with significantly reduced annotation effort and expense is proposed and several multiple instance learning algorithms are used to show that audio event detection through weak labels is feasible.
Reducing Model Complexity for DNN Based Large-Scale Audio Classification
  • Yuzhong Wu, Tan Lee
  • Computer Science
    2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
This paper proposes two different strategies that aim at constructing low-dimensional embedding feature extractors and hence reducing the number of model parameters in the CNN model, shown that the simplified CNN model has only 1/22 model parameters of the original model, with only a slight degradation of performance.