Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis

  title={Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis},
  author={Nicolas Turpault and Romain Serizel and Ankit Shah and Justin Salamon},
This paper presents Task 4 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge and provides a first analysis of the challenge results. The task is a followup to Task 4 of DCASE 2018, and involves training systems for large-scale detection of sound events using a combination of weakly labeled data, i.e. training labels without time boundaries, and strongly-labeled synthesized data. The paper introduces Domestic Environment Sound Event Detection (DESED… 

Figures and Tables from this paper

SpecAugment for Sound Event Detection in Domestic Environments using Ensemble of Convolutional Recurrent Neural Networks
By combining the proposed methods, sound event detection performance can be enhanced, compared with the baseline algorithm, and performance evaluation shows that the proposed method provides detection results of 40.89% for event-based metrics and 66.17% for segment-based metric.
A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge
This work proposes a multi-resolution analysis for feature extraction in Sound Event detection, hypothesizing that different resolutions can be more adequate for the detection of different sound event categories, and that combining the information provided by multiple resolutions could improve the performance of Sound Event Detection systems.
A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes
A benchmark of submissions to Detection and Classification Acoustic Scene and Events 2021 Challenge (DCASE) Task 4 representing a sampling of the state-of-the-art in Sound Event Detection task is proposed and results show that systems adapted to provide coarse segmentation outputs are more robust to different target to non-target signal-to-noise ratio and to time localization of the original event.
Comparative Assessment of Data Augmentation for Semi-Supervised Polyphonic Sound Event Detection
This work proposes a CRNN system exploiting unlabeled data with semi-supervised learning based on the “Mean teacher” method, in combination with data augmentation to overcome the limited size of the training dataset and to further improve the performances.
A sound event detection system for the task 4 of DCASE2020 is built by integrating several methods such signal enhancement and event boundary detection, and built five systems by integrating these methods with supervised system trained by using Mean Teacher model.
Cross-domain sound event detection: from synthesized audio to real audio Technical Report
This technical report describes some of the system information submitted to dcase2020 task4 Sound Event Detection in Domestic Environments, and proposes a DACRNN network for joint learning of sound event detection and domain adaptation.
Performance analysis of weakly-supervised sound event detection system based on the mean-teacher convolutional recurrent neural network model
Performance analysis was performed on parameters, such as the feature, moving average parameter, weight of the consistency cost function, ramp-up length, and maximum learning rate, using the data of DCASE 2020 Task 4.
Sound Event Detection in Synthetic Domestic Environments
A comparative analysis of the performance of state-of-the-art sound event detection systems based on the results of task 4 of the DCASE 2019 challenge, where submitted systems were evaluated on a series of synthetic soundscapes that allow us to carefully control for different soundscape characteristics.
Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes
  • Nicolas Turpault, R. Serizel, J. Salamon
  • Physics, Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
It is shown that temporal localization of sound events remains a challenge for SED systems and that reverberation and non-target sound events severely degrade system performance.
Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments
This paper uses a forward-backward convolutional recurrent neural network for tagging and pseudo labeling followed by tag-conditioned sound event detection (SED) models which are trained using strong pseudo labels provided by the FBCRNN and introduces a strong label loss in the objective of the F BCRNN to take advantage of the strongly labeled synthetic data during training.


Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
Sound Event Detection from Partially Annotated Data: Trends and Challenges
A detailed analysis of the impact of the time segmentation, the event classification and the methods used to exploit unlabeled data on the final performance of sound event detection systems is proposed.
HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised Learning Methods
In this paper, we present a method called HODGEPODGE\footnotemark[1] for large-scale detection of sound events using weakly labeled, synthetic, and unlabeled data proposed in the Detection and
A Closer Look at Weak Label Learning for Audio Events
This work describes a CNN based approach for weakly supervised training of audio events and describes important characteristics, which naturally arise inweakly supervised learning of sound events, and shows how these aspects of weak labels affect the generalization of models.
Adaptive Pooling Operators for Weakly Labeled Sound Event Detection
This paper treats SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality, and develops a family of adaptive pooling operators—referred to as autopool—which smoothly interpolate between common pooling Operators, and automatically adapt to the characteristics of the sound sources in question.
DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System
This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset.
Scaper: A library for soundscape synthesis and augmentation
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
Audio analysis for surveillance applications
The proposed hybrid solution is capable of detecting new kinds of suspicious audio events that occur as outliers against a background of usual activity and adaptively learns a Gaussian mixture model to model the background sounds and updates the model incrementally as new audio data arrives.
Guided Learning Convolution System for DCASE 2019 Task 4
The system submitted to DCASE2019 task 4: sound event detection (SED) in domestic environments with a convolutional neural network with an embedding-level attention pooling module achieves the best performance compared to those of other participates.