Sound Event Detection in Synthetic Domestic Environments

  title={Sound Event Detection in Synthetic Domestic Environments},
  author={Romain Serizel and Nicolas Turpault and Ankit Shah and Justin Salamon},
  journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  • R. Serizel, Nicolas Turpault, J. Salamon
  • Published 1 May 2020
  • Computer Science
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We present a comparative analysis of the performance of state-of-the-art sound event detection systems. In particular, we study the robustness of the systems to noise and signal degradation, which is known to impact model generalization. Our analysis is based on the results of task 4 of the DCASE 2019 challenge, where submitted systems were evaluated on, in addition to real-world recordings, a series of synthetic soundscapes that allow us to carefully control for different soundscape… 

Figures and Tables from this paper

Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes
  • Nicolas Turpault, R. Serizel, J. Salamon
  • Physics, Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
It is shown that temporal localization of sound events remains a challenge for SED systems and that reverberation and non-target sound events severely degrade system performance.
A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes
A benchmark of submissions to Detection and Classification Acoustic Scene and Events 2021 Challenge (DCASE) Task 4 representing a sampling of the state-of-the-art in Sound Event Detection task is proposed and results show that systems adapted to provide coarse segmentation outputs are more robust to different target to non-target signal-to-noise ratio and to time localization of the original event.
An analysis of Sound Event Detection under acoustic degradation using multi-resolution systems
This paper analyzes the performance of Sound Event Detection systems under diverse acoustic conditions such as high-pass or low-pass filtering, clipping or dynamic range compression, and provides insights on the benefits of this multi-resolution approach in different acoustic settings.
Training Sound Event Detection on a Heterogeneous Dataset
This work proposes to perform a detailed analysis of DCASE 2020 task 4 sound event detection baseline with regards to several aspects such as the type of data used for training, the parameters of the mean-teacher or the transformations applied while generating the synthetic soundscapes.
The impact of non-target events in synthetic soundscapes for sound event detection
The results show that using both target and non- target events for only one of the phases (validation or training) helps the system to properly detect sound events, outperforming the baseline (which uses non-target events in both phases).
Improving Sound Event Detection in Domestic Environments using Sound Separation
This paper starts from a sound separation model trained on the Free Universal Sound Separation dataset and the DCASE 2020 task 4 sound event detection baseline, and explores different methods to combine separated sound sources and the original mixture within the sound event Detection.
A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge
This work proposes a multi-resolution analysis for feature extraction in Sound Event detection, hypothesizing that different resolutions can be more adequate for the detection of different sound event categories, and that combining the information provided by multiple resolutions could improve the performance of Sound Event Detection systems.
A number of problems that are associated with the development of sound event detection systems, such as the deviation for each environment and each sound category, overlapping audio events, unreliable training data, etc are presented.
A Comprehensive Review of Polyphonic Sound Event Detection
This paper aims to provide an in-depth discussion of different methodologies proposed by various authors that include the features used, detection algorithms, and their corresponding accuracy and limitations.
Sound event aware environmental sound segmentation with Mask U-Net
An environmental sound segmentation method called Mask U-Net is proposed, which robustly differentiates sound event lengths among classes and was faster than conventional methods and showed high segmentation performance.


Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
Audio analysis for surveillance applications
The proposed hybrid solution is capable of detecting new kinds of suspicious audio events that occur as outliers against a background of usual activity and adaptively learns a Gaussian mixture model to model the background sounds and updates the model incrementally as new audio data arrives.
Scaper: A library for soundscape synthesis and augmentation
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
Sound Event Detection from Partially Annotated Data: Trends and Challenges
A detailed analysis of the impact of the time segmentation, the event classification and the methods used to exploit unlabeled data on the final performance of sound event detection systems is proposed.
Feature learning with deep scattering for urban sound analysis
  • J. Salamon, J. Bello
  • Computer Science
    2015 23rd European Signal Processing Conference (EUSIPCO)
  • 2015
It is shown that the scattering transform can be used as an alternative signal representation to the mel-spectrogram whilst reducing both the amount of training data required for feature learning and the size of the learned codebook by an order of magnitude.
Computational Analysis of Sound Scenes and Events
This book presents computational methods for extracting the useful information from audio signals, collecting the state of the art in the field of sound event and scene analysis, and gives an overview of methods for computational analysis of sounds scenes and events.
In this paper, we present a method to detect sound events in domestic environments using small weakly labeled data, large unlabeled data, and strongly labeled synthetic data as proposed in the
HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised Learning Methods
In this paper, we present a method called HODGEPODGE\footnotemark[1] for large-scale detection of sound events using weakly labeled, synthetic, and unlabeled data proposed in the Detection and