Limitations of Weak Labels for Embedding and Tagging

@article{Turpault2020LimitationsOW,
  title={Limitations of Weak Labels for Embedding and Tagging},
  author={Nicolas Turpault and Romain Serizel and Emmanuel Vincent},
  journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2020},
  pages={131-135}
}
  • Nicolas Turpault, R. Serizel, E. Vincent
  • Published 5 February 2020
  • Computer Science
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Many datasets and approaches in ambient sound analysis use weakly labeled data. Weak labels are employed because annotating every data sample with a strong label is too expensive. Yet, their impact on the performance in comparison to strong labels remains unclear. Indeed, weak labels must often be dealt with at the same time as other challenges, namely multiple labels per sample, unbalanced classes and/or overlapping events. In this paper, we formulate a supervised learning problem which… 

Figures and Tables from this paper

Training Sound Event Detection on a Heterogeneous Dataset
TLDR
This work proposes to perform a detailed analysis of DCASE 2020 task 4 sound event detection baseline with regards to several aspects such as the type of data used for training, the parameters of the mean-teacher or the transformations applied while generating the synthetic soundscapes.
Auto-labeling of sensor data using social media messages: a case study for a smart city
TLDR
This study investigates the feasibility of utilizing social media messages to extract meaningful labels for machine learning to predict events and situations in smart city environments and proposes an improved clustering algorithm using an outlier detection technique to Extract meaningful labels in a more robust way.
FSD50K: An Open Dataset of Human-Labeled Sound Events
TLDR
FSD50K is introduced, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology, to provide an alternative benchmark dataset and thus foster SER research.
THE SMALLRICE SUBMISSION TO THE DCASE2021 TASK 4 CHALLENGE: A LIGHTWEIGHT APPROACH FOR SEMI-SUPERVISED SOUND EVENT DETECTION WITH UNSUPERVISED DATA AUGMENTATION Technical Report
TLDR
This work focuses on training a lightweight and well-performing model which can be used in real-world applications, and which can achieve competitive performance to the more common mean teacher paradigm.
The Benefit of Temporally-Strong Labels in Audio Event Classification
  • Shawn Hershey, D. Ellis, M. Plakal
  • Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
TLDR
It is shown that fine-tuning with a mix of weak- and strongly-labeled data can substantially improve classifier performance, even when evaluated using only the original weak labels.

References

SHOWING 1-10 OF 32 REFERENCES
A Closer Look at Weak Label Learning for Audio Events
TLDR
This work describes a CNN based approach for weakly supervised training of audio events and describes important characteristics, which naturally arise inweakly supervised learning of sound events, and shows how these aspects of weak labels affect the generalization of models.
Learning from Weak and Noisy Labels for Semantic Segmentation
TLDR
This work casts the WSSS problem into a label noise reduction problem, and develops an efficient learning algorithm by introducing an intermediate labelling variable to directly and explicitly detect noisy labels.
Adaptive Pooling Operators for Weakly Labeled Sound Event Detection
TLDR
This paper treats SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality, and develops a family of adaptive pooling operators—referred to as autopool—which smoothly interpolate between common pooling Operators, and automatically adapt to the characteristics of the sound sources in question.
Audio Event Detection using Weakly Labeled Data
TLDR
It is shown that audio event detection using weak labels can be formulated as an Multiple Instance Learning problem and two frameworks for solving multiple-instance learning are suggested, one based on support vector machines, and the other on neural networks.
Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings
TLDR
This paper investigates how L3-Net design choices impact the performance of downstream audio classifiers trained with these embeddings, and shows that audio-informed choices of input representation are important, and that using sufficient data for training the embedding is key.
Sound Event Detection from Partially Annotated Data: Trends and Challenges
TLDR
A detailed analysis of the impact of the time segmentation, the event classification and the methods used to exploit unlabeled data on the final performance of sound event detection systems is proposed.
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
TLDR
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Prototypical Networks for Few-shot Learning
TLDR
This work proposes Prototypical Networks for few-shot classification, and provides an analysis showing that some simple design decisions can yield substantial improvements over recent approaches involving complicated architectural choices and meta-learning.
Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging
TLDR
A shrinking deep neural network (DNN) framework incorporating unsupervised feature learning to handle the multilabel classification task and a symmetric or asymmetric deep denoising auto-encoder (syDAE or asyDAE) to generate new data-driven features from the logarithmic Mel-filter banks features.
...
1
2
3
4
...