• Corpus ID: 51864948

Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments

  title={Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments},
  author={Romain Serizel and Nicolas Turpault and Hamid Eghbal-zadeh and Ankit Parag Shah},
This paper presents DCASE 2018 task 4. The task evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries). The target of the systems is to provide not only the event class but also the event time boundaries given that multiple events can be present in an audio recording. Another challenge of the task is to explore the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeled… 

Figures and Tables from this paper

This paper proposes a constant quality transform based input feature for baseline architecture to learn the start and end time of sound events (strong labels) in an audio recording given just the
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Sound event detection using weakly-labeled semi-supervised data with GCRNNS, VAT and Self-Adaptive Label Refinement
A gated convolutional recurrent neural network based approach to solve task 4, large-scale weakly labelled semi-supervised sound event detection in domestic environments, of the DCASE 2018 challenge and introduces self-adaptive label refinement, a method which allows unsupervised adaption of the trained system to refine the accuracy of frame-level class predictions.
Sound event detection from weak annotations: weighted-GRU versus multi-instance-learning
This paper addresses the detection of audio events in domestic environments in the case where a weakly annotated dataset is available for training, and explores two approaches: a ”weighted-GRU” (WGRU), in which a Convolutional Recurrent Neural Network is trained for classification and then exploited at the output of the time-distributed dense layer to perform localization.
Peer Collaborative Learning for Polyphonic Sound Event Detection
This paper describes that semi-supervised learning called peer collaborative learning (PCL) can be applied to the polyphonic sound event detection (PSED) task, which is one of the tasks in the
This paper reports experiments in Sound Event Detection in domestic environments in the framework of the DCASE 2019 Task 4 challenge, and explores multi-task training to take advantage of the synthetic and unlabeled in domain subsets.
Sound Event Detection in the DCASE 2017 Challenge
Analysis of the systems behavior reveals that task-specific optimization has a big role in producing good performance; however, often this optimization closely follows the ranking metric, and its maximization/minimization does not result in universally good performance.
Cosine-similarity penalty to discriminate sound classes in weakly-supervised sound event detection
This work addresses Sound Event Detection in the case where a weakly annotated dataset is available for training, and explores an approach inspired by Multiple Instance Learning, in which a convolutional recurrent neural network is trained to give predictions at frame-level using a custom loss function based on the weak labels and the statistics of the frame-based predictions.
Weakly labeled semi-supervised sound event detection using CRNN with inception module
By applying the proposed method to a weakly labeled semi-supervised sound event detection, it was verified that the proposed system provides better performance compared to the DCASE 2018 baseline system.
Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection
This work presents a hybrid approach that combines an acoustic-driven event boundary detection and a supervised label inference using a deep neural network that leverages benefits of both unsupervised and supervised methodologies and takes advantage of large amounts of unlabeled data, making it ideal for large-scale weakly la-beled event detection.


Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly
Semi-supervised learning helps in sound event classification
  • Zixing Zhang, Björn Schuller
  • Computer Science
    2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2012
Adding unlabelled sound event data to the training set based on sufficient classifier confidence level after its automatic labelling level can significantly enhance classification performance, and combined with optimal re-sampling of originally labelled instances and iteratively learning in semi-supervised manner can reach approximately half the one achieved by using the originally manually labelled data.
Audio Set: An ontology and human-labeled dataset for audio events
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System
This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset.
Combining Multi-Scale Features Using Sample-Level Deep Convolutional Neural Networks for Weakly Supervised Sound Event Detection
This paper describes the method submitted to large-scale weakly supervised sound event detection for smart cars in the DCASE Challenge 2017, and shows that the waveform-based models can be comparable to spectrogrambased models when compared to other DCASE Task 4 submissions.
Recurrent neural networks for polyphonic sound event detection in real life recordings
In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single
Unsupervised Learning of Semantic Audio Representations
  • A. Jansen, M. Plakal, R. Saurous
  • Computer Science
    2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
This work considers several class-agnostic semantic constraints that apply to unlabeled nonspeech audio and proposes low-dimensional embeddings of the input spectrograms that recover 41% and 84% of the performance of their fully-supervised counterparts when applied to downstream query-by-example sound retrieval and sound event classification tasks, respectively.
An approach for self-training audio event detectors using web data
Combining labeled audio from a dataset and unlabeled audio from the web to improve the sound models showed an improvement of the AED, and uncovered challenges of using web audio from videos.
Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features
The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED.
Audio Event Detection Using Multiple-Input Convolutional Neural Network
This paper describes the model and training framework from the submission for DCASE 2017 task 3: sound event detection in real life audio, and shows meaningful improvements in cross-validation experiments compared to the baseline system.