Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes

@article{Turpault2021SoundED,
  title={Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes},
  author={Nicolas Turpault and Romain Serizel and Scott Wisdom and Hakan Erdogan and John R. Hershey and Eduardo Fonseca and Prem Seetharaman and Justin Salamon},
  journal={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2021},
  pages={840-844}
}
  • Nicolas Turpault, R. Serizel, J. Salamon
  • Published 2 November 2020
  • Physics, Computer Science
  • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We propose a benchmark of state-of-the-art sound event detection systems (SED). We design synthetic evaluation sets to focus on specific sound event detection challenges. We analyze the performance of the submissions to DCASE 2020 Task 4 as a function of time-related modifications (time position of an event and length of clips) and study the impact of non-target sound events and reverberation. We show that temporal localization of sound events remains a challenge for SED systems. We also show… 

Figures and Tables from this paper

A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes
TLDR
A benchmark of submissions to Detection and Classification Acoustic Scene and Events 2021 Challenge (DCASE) Task 4 representing a sampling of the state-of-the-art in Sound Event Detection task is proposed and results show that systems adapted to provide coarse segmentation outputs are more robust to different target to non-target signal-to-noise ratio and to time localization of the original event.
An analysis of Sound Event Detection under acoustic degradation using multi-resolution systems
TLDR
This paper analyzes the performance of Sound Event Detection systems under diverse acoustic conditions such as high-pass or low-pass filtering, clipping or dynamic range compression, and provides insights on the benefits of this multi-resolution approach in different acoustic settings.
Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection
TLDR
A novel selective pseudo-labeling approach is proposed, termed SPL, to produce high confidence separated target events from blind sound separation outputs, which are then used to fine-tune the original SED model that pre-trained on the sound mixtures in a multi-objective learning style.
Threshold Independent Evaluation of Sound Event Detection Scores
TLDR
A method which allows for computing system performance on an evaluation set for all possible thresholds jointly, enabling accurate computation not only of the PSD-ROC and PSDS but also of other collar-based and intersection-based performance curves.
Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments
TLDR
This paper uses a forward-backward convolutional recurrent neural network for tagging and pseudo labeling followed by tag-conditioned sound event detection (SED) models which are trained using strong pseudo labels provided by the FBCRNN and introduces a strong label loss in the objective of the F BCRNN to take advantage of the strongly labeled synthetic data during training.
COUPLE LEARNING: MEAN TEACHER WITH PLG MODEL IMPROVES THE RESULTS OF SOUND EVENT DETECTION
TLDR
An effective Couple Learning method that combines a well-trained model and a Mean Teacher model that reduces the noise impact in the pseudo-labels introduced by detection errors and increases strongly and weakly-labeled data to improve the Mean Teacher method’s performance.
Self-Supervised Learning from Automatically Separated Sound Scenes
TLDR
This paper explores the use of unsupervised automatic sound separation to decompose unlabeled sound scenes into multiple semantically-linked views for use in self-supervised contrastive learning and finds that learning to associate input mixtures with their automatically separated outputs yields stronger representations than past approaches that use the mixtures alone.
Couple Learning for semi-supervised sound event detection
TLDR
An effective Couple Learning method 1 that combines a well-trained model and a Mean Teacher model that improves the Mean Teacher method’s performance and reduces the noise impact in the pseudo-labels introduced by detection errors is proposed.
Computational bioacoustics with deep learning: a review and roadmap
TLDR
This paper offers a subjective but principled roadmap for computational bioacoustics with deep learning: topics that the community should address, in order to make the most of future developments in AI and informatics, and to use audio data in answering zoological and ecological questions.
Self-training with noisy student model and semi-supervised loss function for dcase 2021 challenge task 4
TLDR
The performance of the proposed SED model is evaluated on the validation set of the DCASE 2021 Challenge Task 4, and several ensemble models that combine five-fold validation models with different hyperparameters of the semi-supervised loss function are finally selected as final models.
...
1
2
...

References

SHOWING 1-10 OF 33 REFERENCES
Sound Event Detection in Synthetic Domestic Environments
TLDR
A comparative analysis of the performance of state-of-the-art sound event detection systems based on the results of task 4 of the DCASE 2019 challenge, where submitted systems were evaluated on a series of synthetic soundscapes that allow us to carefully control for different soundscape characteristics.
Improving Sound Event Detection in Domestic Environments using Sound Separation
TLDR
This paper starts from a sound separation model trained on the Free Universal Sound Separation dataset and the DCASE 2020 task 4 sound event detection baseline, and explores different methods to combine separated sound sources and the original mixture within the sound event Detection.
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
TLDR
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Training Sound Event Detection on a Heterogeneous Dataset
TLDR
This work proposes to perform a detailed analysis of DCASE 2020 task 4 sound event detection baseline with regards to several aspects such as the type of data used for training, the parameters of the mean-teacher or the transformations applied while generating the synthetic soundscapes.
Cross-domain sound event detection: from synthesized audio to real audio Technical Report
TLDR
This technical report describes some of the system information submitted to dcase2020 task4 Sound Event Detection in Domestic Environments, and proposes a DACRNN network for joint learning of sound event detection and domain adaptation.
Sound Event Detection from Partially Annotated Data: Trends and Challenges
TLDR
A detailed analysis of the impact of the time segmentation, the event classification and the methods used to exploit unlabeled data on the final performance of sound event detection systems is proposed.
Scaper: A library for soundscape synthesis and augmentation
TLDR
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
SEMI-SUPERVISED SOUND EVENT DETECTION BASED ON MEAN TEACHER WITH POWER POOLING AND DATA AUGMENTATION Technical Report
TLDR
The details of the system submitted to DCASE2020 task4: sound event detection (SED) and separation in domestic environments, which mainly focuses on the scenario that recognizes sound events without source separation is described.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
TLDR
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
...
1
2
3
4
...