A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge

  title={A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge},
  author={Diego De Benito-Gorr{\'o}n and Daniel Ramos and Doroteo Torre Toledano},
  journal={IEEE Access},
Sound Event Detection is a task with a rising relevance over the recent years in the field of audio signal processing, due to the creation of specific datasets such as Google AudioSet or DESED (Domestic Environment Sound Event Detection) and the introduction of competitive evaluations like the DCASE Challenge (Detection and Classification of Acoustic Scenes and Events). The different categories of acoustic events can present diverse temporal and spectral characteristics. However, most… 

An analysis of Sound Event Detection under acoustic degradation using multi-resolution systems

This paper analyzes the performance of Sound Event Detection systems under diverse artificial acoustic conditions such as high- or low-pass filtering and clipping or dynamic range compression, as well as under an scenario of high overlap between events.

Sound Event Detection Using Attention and Aggregation-Based Feature Pyramid Network

This paper proposes a sound event detection (SED) model using an EfficientNet-B2 and an attention and aggregation-based feature pyramid network (A2-FPN) that is trained by the mean-teacher approach to utilize weakly labeled and unlabeled data.

Temporal coding with magnitude-phase regularization for sound event detection

This paper proposes a novel temporal coding of magnitude and phase for embedding vectors in an intermediate layer that results in notable improvement in timing sensitivity compared to a baseline system tested on SED task in the context of DCASE2021 challenge.

Feedback Module Based Convolution Neural Networks for Sound Event Classification

A weighted recurrent inference based model by employing cascading feedback modules for sound event classification is proposed, which outperforms traditional approaches in indoor and outdoor conditions by 1.94% and 3.26%, respectively.

Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention

A convolutional recurrent neural network model based on the temporal-frequency attention mechanism and feature space (FS) attention mechanism (TFFS-CRNN) has better classification performance and lower ER in polyphonic SED.


A multi-resolution feature extraction approach is proposed, aiming to take advantage of the different lengths and spectral characteristics of each target category, which is able to outperform the baseline results.

Acoustic Scene Classification using Attention based Deep Learning Model

  • Mie Mie OoNu War
  • Computer Science
    International Journal of Intelligent Engineering and Systems
  • 2022
End-to-end deep residual network embedded channel attention is explored to learn the discriminative features from the audio scene to classification results with an average accuracy of 80.82%.

The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks

This paper formalizes this task as the cocktail fork problem, and presents the Divide and Remaster dataset to foster research on this topic, and introduces a new mixed-STFT-resolution model to better address the variety of acoustic characteristics of the three source types.

Improving Induced Valence Recognition by Integrating Acoustic Sound Semantics in Movies

This work explores the use of cross-modal attention mechanism in modeling how the verbal and non-verbal human sound semantics affect induced valence jointly with conventional audio-visual content-based modeling.



Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis

The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.

Sound Event Detection from Partially Annotated Data: Trends and Challenges

A detailed analysis of the impact of the time segmentation, the event classification and the methods used to exploit unlabeled data on the final performance of sound event detection systems is proposed.

Audio Event Detection using Weakly Labeled Data

It is shown that audio event detection using weak labels can be formulated as an Multiple Instance Learning problem and two frameworks for solving multiple-instance learning are suggested, one based on support vector machines, and the other on neural networks.

A Framework for the Robust Evaluation of Sound Event Detection

A new framework for performance evaluation of polyphonic sound event detection (SED) systems is defined, which overcomes the limitations of the conventional collar-based event decisions, event F-scores and event error rates and introduces a definition of event detection that is more robust against labelling subjectivity.

Training Sound Event Detection on a Heterogeneous Dataset

This work proposes to perform a detailed analysis of DCASE 2020 task 4 sound event detection baseline with regards to several aspects such as the type of data used for training, the parameters of the mean-teacher or the transformations applied while generating the synthetic soundscapes.

Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection

A task-aware mean teacher method using a convolutional recurrent neural network (CRNN) with multi-branch structure to solve the SED and AT tasks differently, with results demonstrating the superiority of the proposed method on DCASE2018 challenge.

Waveform-based End-to-end Deep Convolutional Neural Network with Multi-scale Sliding Windows for Weakly Labeled Sound Event Detection

  • Seokjin LeeMinhan Kim
  • Computer Science
    2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)
  • 2020
A waveform-based end-to-end sound event detection algorithm that detects and classifies sound events using a deep convolutional neural network architecture is proposed, which consists of multi-scale time frames and networks that handle both short and long signal characteristics.

Metrics for Polyphonic Sound Event Detection

This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources

Sound Event Detection in Synthetic Domestic Environments

A comparative analysis of the performance of state-of-the-art sound event detection systems based on the results of task 4 of the DCASE 2019 challenge, where submitted systems were evaluated on a series of synthetic soundscapes that allow us to carefully control for different soundscape characteristics.

Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

This work aims to study the implementation of several neural network-based systems for speech and music event detection over a collection of 77,937 10-second audio segments, selected from the Google AudioSet dataset.