• Corpus ID: 220403357

Training Sound Event Detection on a Heterogeneous Dataset

  title={Training Sound Event Detection on a Heterogeneous Dataset},
  author={Nicolas Turpault and Romain Serizel},
Training a sound event detection algorithm on a heterogeneous dataset including both recorded and synthetic soundscapes that can have various labeling granularity is a non-trivial task that can lead to systems requiring several technical choices. These technical choices are often passed from one system to another without being questioned. We propose to perform a detailed analysis of DCASE 2020 task 4 sound event detection baseline with regards to several aspects such as the type of data used… 

Figures and Tables from this paper

Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes
  • Nicolas Turpault, R. Serizel, J. Salamon
  • Physics, Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
It is shown that temporal localization of sound events remains a challenge for SED systems and that reverberation and non-target sound events severely degrade system performance.
A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge
This work proposes a multi-resolution analysis for feature extraction in Sound Event detection, hypothesizing that different resolutions can be more adequate for the detection of different sound event categories, and that combining the information provided by multiple resolutions could improve the performance of Sound Event Detection systems.
Improving Sound Event Detection in Domestic Environments using Sound Separation
This paper starts from a sound separation model trained on the Free Universal Sound Separation dataset and the DCASE 2020 task 4 sound event detection baseline, and explores different methods to combine separated sound sources and the original mixture within the sound event Detection.
Sound Event Detection with Cross-Referencing Self-Training
This approach takes advantage of semi-supervised training using pseudo-labels from a balanced student-teacher model, and outperforms DCASE2021 challenge baseline in terms of Poly-phonic Sound event Detection Score.
Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures
This study proposes a semi-supervised method for generating pseudo-labels from unsupervised data using a student-teacher scheme that balances self- training and cross-training and explores post-processing which extracts sound intervals from network prediction, for further improvement in sound event detection performance.
A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes
A benchmark of submissions to Detection and Classification Acoustic Scene and Events 2021 Challenge (DCASE) Task 4 representing a sampling of the state-of-the-art in Sound Event Detection task is proposed and results show that systems adapted to provide coarse segmentation outputs are more robust to different target to non-target signal-to-noise ratio and to time localization of the original event.
An analysis of Sound Event Detection under acoustic degradation using multi-resolution systems
This paper analyzes the performance of Sound Event Detection systems under diverse artificial acoustic conditions such as high- or low-pass filtering and clipping or dynamic range compression, as well as under an scenario of high overlap between events.
Self-Training for Sound Event Detection in Audio Mixtures
A self-training technique to leverage unlabeled datasets in supervised learning using pseudo label estimation and a dual-term objective function: a classification loss for the original labels and expectation loss for pseudo labels is proposed.
Peer Collaborative Learning for Polyphonic Sound Event Detection
This paper describes that semi-supervised learning called peer collaborative learning (PCL) can be applied to the polyphonic sound event detection (PSED) task, which is one of the tasks in the
This paper presents the systems proposal for the DCASE 2021 challenge Task 4 (Sound event detection and separation in domestic environments). The aim is to provide the event time localization


Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
Adaptive Pooling Operators for Weakly Labeled Sound Event Detection
This paper treats SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality, and develops a family of adaptive pooling operators—referred to as autopool—which smoothly interpolate between common pooling Operators, and automatically adapt to the characteristics of the sound sources in question.
Sound Event Detection in Synthetic Domestic Environments
A comparative analysis of the performance of state-of-the-art sound event detection systems based on the results of task 4 of the DCASE 2019 challenge, where submitted systems were evaluated on a series of synthetic soundscapes that allow us to carefully control for different soundscape characteristics.
TUT database for acoustic scene classification and sound event detection
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
A Framework for the Robust Evaluation of Sound Event Detection
A new framework for performance evaluation of polyphonic sound event detection (SED) systems is defined, which overcomes the limitations of the conventional collar-based event decisions, event F-scores and event error rates and introduces a definition of event detection that is more robust against labelling subjectivity.
Scaper: A library for soundscape synthesis and augmentation
Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
Detection and Classification of Acoustic Scenes and Events
The state of the art in automatically classifying audio scenes, and automatically detecting and classifyingaudio events is reported on.
FSD50K: An Open Dataset of Human-Labeled Sound Events
FSD50K is introduced, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology, to provide an alternative benchmark dataset and thus foster SER research.