Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

  title={Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks},
  author={Sharath Adavanne and Archontis Politis and Joonas Nikunen and Tuomas Virtanen},
  journal={IEEE Journal of Selected Topics in Signal Processing},
In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space. [] Key Method As the first output, the sound event detection (SED) is performed as a multi-label classification task on each time frame producing temporal activity for all the sound event classes.

Sound event localization and detection based on crnn using rectangular filters and channel rotation data augmentation

The proposed system is a convolutional recurrent neural network using rectangular filters specialized in recognizing significant spectral features related to the task, considerably improving Error Rate and F-score for location-aware detection.

U Recurrent Neural Network for Polyphonic Sound Event Detection and Localization

A novel model called U recurrent neural network (URNN) is proposed to alleviate problems of polyphonic sound event detection and localization, it combines the low-level and high-level features in the model without significantly increasing computation costs, and exploits the identity layer to make the network deeper.

Hierarchical Detection of Sound Events and their Localization Using Convolutional Neural Networks with Adaptive Thresholds

The system is based on multi-channel convolutional neural networks, combined with data augmentation and ensembling, and follows a hierarchical approach that first determines adaptive thresholds for the multi-label sound event detection (SED) problem, based on a CNN operating on spectrograms over longduration windows.

Ensemble of Sequence Matching Networks for Dynamic Sound Event Localization, Detection, and Tracking

In order to estimate directions-of-arrival of moving sound sources with higher required spatial resolutions than those of static sources, this work proposes to separate the directional estimates into azimuth and elevation estimates before passing them to the sequence matching network.


This paper describes a three-stage approach system for sound event localization and detection (SELD) task, which employs the multi-resolution cochleagram from 4-channel audio and convolutional recurrent neural network (CRNN) model to detect sound activity.


This paper describes our contribution to the task of sound event localization and detection (SELD) using first-order ambisonic signals at the Detection and Classification of Acoustic Scenes and

A Method of Sound Event Localization and Detection Based on Three-Dimension Convolution

A method based on three-dimension convolution feature extraction called SELD3Dnet is proposed, and the results show that the proposed method improves the F1 metric and the frame recall metric on average under various types of real scene data subset ov1, ov2, ov3, which can validate the performance of the proposedmethod.

Sound Event Localization and Detection Based on Adaptive Hybrid Convolution and Multi-scale Feature Extractor

A method based on Adaptive Hybrid Convolution (AHConv) and multi-scale feature extractor to capture the dependencies along with the time dimension and the frequency dimension respectively and an adaptive attention block that can integrate information from very local to exponentially enlarged receptive field within the block is proposed.

A Sequence Matching Network for Polyphonic Sound Event Localization and Detection

A two-step approach that decouples the learning of the sound event detection and directional-of-arrival estimation systems is proposed, which allows the flexibility in the system design, and increases the performance of the whole sound event localization and detection system.

Sound Event Detection and Direction of Arrival Estimation using Residual Net and Recurrent Neural Networks

Deep residual nets originally used for image classification are adapted and combined with recurrent neural networks to estimate the onset-offset of sound events, sound events class, and their direction in a reverberant environment to improve the system performance on unseen data.



Rare Sound Event Detection Using 1D Convolutional Recurrent Neural Networks

The proposed system using combination of 1D convolutional neural network and recurrent neural network (RNN) with long shortterm memory units (LSTM) has achieved the 1st place in the challenge with an error rate of 0.13 and an F-Score of 93.1.

Sound event detection using spatial features and convolutional recurrent neural network

This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection and shows that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multich channel audio better when they are presented as separate layers of a volume.


This paper presents a multi label bi-directional recurrent neural network to model the temporal evolution of sound events, and explores data augmentation techniques that have shown success in sound classification.

Polyphonic sound event detection using multi label deep neural networks

Frame-wise spectral-domain features are used as inputs to train a deep neural network for multi label classification in this work and the proposed method improves the accuracy by 19% percentage points overall.

Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features

The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED.

Duration-Controlled LSTM for Polyphonic Sound Event Detection

This paper builds upon a state-of-the-art SED method that performs frame-by-frame detection using a bidirectional LSTM recurrent neural network, and incorporates a duration-controlled modeling technique based on a hidden semi-Markov model that makes it possible to model the duration of each sound event precisely and to perform sequence- by-sequence detection without having to resort to thresholding.

Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features

The proposed method learns to recognize overlapping sound events from multichannel features faster and performs better SED with a fewer number of training epochs.

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

This work combines these two approaches in a convolutional recurrent neural network (CRNN) and applies it on a polyphonic sound event detection task and observes a considerable improvement for four different datasets consisting of everyday sound events.

Robust sound event recognition using convolutional neural networks

This work proposes novel features derived from spectrogram energy triggering, allied with the powerful classification capabilities of a convolutional neural network (CNN), which demonstrates excellent performance under noise-corrupted conditions when compared against state-of-the-art approaches on standard evaluation tasks.

Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network

The results show that the proposed DOAnet is capable of estimating the number of sources and their respective DOAs with good precision and generate SPS with high signal-to-noise ratio.