An analysis of Sound Event Detection under acoustic degradation using multi-resolution systems

  title={An analysis of Sound Event Detection under acoustic degradation using multi-resolution systems},
  author={Diego de Benito-Gorr{\'o}n and Daniel Ramos and Doroteo Torre Toledano},
  journal={IberSPEECH 2021},
The Sound Event Detection task aims to determine the temporal locations of acoustic events in audio clips. In recent years, the relevance of this field is rising due to the introduction of datasets such as Google AudioSet or DESED (Domestic Environment Sound Event Detection) and competitive evaluations like the DCASE Challenge (Detection and Classification of Acoustic Scenes and Events). In this paper, we analyze the performance of Sound Event Detection systems under diverse artificial acoustic… 

Figures from this paper

A Study on Robustness to Perturbations for Representations of Environmental Sound

This work imitate channel effects by injecting perturbations to the audio signal and measure the shift in the new (perturbed) embeddings with three distance measures, making the evaluation domain-dependent but not task-dependent.


A multi-resolution feature extraction approach is proposed, aiming to take advantage of the different lengths and spectral characteristics of each target category, which is able to outperform the baseline results.

Model Training that Prioritizes Rare Overlapped Labels for Polyphonic Sound Event Detection

The proposed method outperforms the baseline with respect to rare labels, with an average precision of 1.18 percentage points, and the experimental results demonstrate the effectiveness of the proposed method for both overlap of sound events and rare labels.



A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge

This work proposes a multi-resolution analysis for feature extraction in Sound Event detection, hypothesizing that different resolutions can be more adequate for the detection of different sound event categories, and that combining the information provided by multiple resolutions could improve the performance of Sound Event Detection systems.

Comparative Assessment of Data Augmentation for Semi-Supervised Polyphonic Sound Event Detection

This work proposes a CRNN system exploiting unlabeled data with semi-supervised learning based on the “Mean teacher” method, in combination with data augmentation to overcome the limited size of the training dataset and to further improve the performances.

Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis

The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.

Task-Aware Separation for the DCASE 2020 Task 4 Sound Event Detection and Separation Challenge

This work presents a permutation-invariant training scheme for optimizing the Source Separation system directly with the back-end Sound Event Detection objective without requiring joint training or fine-tuning of the two systems.

Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments

This paper uses a forward-backward convolutional recurrent neural network for tagging and pseudo labeling followed by tag-conditioned sound event detection (SED) models which are trained using strong pseudo labels provided by the FBCRNN and introduces a strong label loss in the objective of the F BCRNN to take advantage of the strongly labeled synthetic data during training.

Sound Event Detection from Partially Annotated Data: Trends and Challenges

A detailed analysis of the impact of the time segmentation, the event classification and the methods used to exploit unlabeled data on the final performance of sound event detection systems is proposed.

The Machine Learning Approach for Analysis of Sound Scenes and Events

This chapter explains the basic concepts in computational methods used for analysis of sound scenes and events, and focuses on the machine learning approach, where the sound categories to be analyzed are defined in advance.

Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes

  • Nicolas TurpaultR. Serizel J. Salamon
  • Physics, Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
It is shown that temporal localization of sound events remains a challenge for SED systems and that reverberation and non-target sound events severely degrade system performance.

Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

This work aims to study the implementation of several neural network-based systems for speech and music event detection over a collection of 77,937 10-second audio segments, selected from the Google AudioSet dataset.

Sound Event Detection in Synthetic Domestic Environments

A comparative analysis of the performance of state-of-the-art sound event detection systems based on the results of task 4 of the DCASE 2019 challenge, where submitted systems were evaluated on a series of synthetic soundscapes that allow us to carefully control for different soundscape characteristics.