Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

@article{Adavanne2019SoundEL,
  title={Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks},
  author={Sharath Adavanne and Archontis Politis and Joonas Nikunen and Tuomas Virtanen},
  journal={IEEE Journal of Selected Topics in Signal Processing},
  year={2019},
  volume={13},
  pages={34-48}
}
In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space. [] Key Method As the first output, the sound event detection (SED) is performed as a multi-label classification task on each time frame producing temporal activity for all the sound event classes.
Sound event localization and detection based on crnn using rectangular filters and channel rotation data augmentation
TLDR
The proposed system is a convolutional recurrent neural network using rectangular filters specialized in recognizing significant spectral features related to the task, considerably improving Error Rate and F-score for location-aware detection.
Joint Measurement of Multi-channel Sound Event Detection and Localization Using Deep Neural Network
TLDR
This paper extracts the phase feature and amplitude feature of the sound spectrum from each audio channel, avoiding feature extraction limited by other microphone arrays.
U Recurrent Neural Network for Polyphonic Sound Event Detection and Localization
TLDR
A novel model called U recurrent neural network (URNN) is proposed to alleviate problems of polyphonic sound event detection and localization, it combines the low-level and high-level features in the model without significantly increasing computation costs, and exploits the identity layer to make the network deeper.
Exploring Detection and Localization of Overlapping Sound Sources with Deep Learning
TLDR
This project has been submitted as possible solution to the Task 3, focused on sound event localization and detection, and has been evaluated using the same dataset provided for the DCASE 2020 Challenge Task 3: TAU-NIGENS Spatial Sound Events 2020.
Hierarchical Detection of Sound Events and their Localization Using Convolutional Neural Networks with Adaptive Thresholds
TLDR
The system is based on multi-channel convolutional neural networks, combined with data augmentation and ensembling, and follows a hierarchical approach that first determines adaptive thresholds for the multi-label sound event detection (SED) problem, based on a CNN operating on spectrograms over longduration windows.
Ensemble of Sequence Matching Networks for Dynamic Sound Event Localization, Detection, and Tracking
TLDR
In order to estimate directions-of-arrival of moving sound sources with higher required spatial resolutions than those of static sources, this work proposes to separate the directional estimates into azimuth and elevation estimates before passing them to the sequence matching network.
THREE-STAGE APPROACH FOR SOUND EVENT LOCALIZATION AND DETECTION Technical Report
TLDR
This paper describes a three-stage approach system for sound event localization and detection (SELD) task, which employs the multi-resolution cochleagram from 4-channel audio and convolutional recurrent neural network (CRNN) model to detect sound activity.
ARBORESCENT NEURAL NETWORK ARCHITECTURES FOR SOUND EVENT DETECTION AND LOCALIZATION Technical Report
This paper describes our contribution to the task of sound event localization and detection (SELD) using first-order ambisonic signals at the Detection and Classification of Acoustic Scenes and
Sound Event Localization and Detection Based on Adaptive Hybrid Convolution and Multi-scale Feature Extractor
TLDR
A method based on Adaptive Hybrid Convolution (AHConv) and multi-scale feature extractor to capture the dependencies along with the time dimension and the frequency dimension respectively and an adaptive attention block that can integrate information from very local to exponentially enlarged receptive field within the block is proposed.
A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
TLDR
A two-step approach that decouples the learning of the sound event detection and directional-of-arrival estimation systems is proposed, which allows the flexibility in the system design, and increases the performance of the whole sound event localization and detection system.
...
...

References

SHOWING 1-10 OF 75 REFERENCES
Sound event detection using spatial features and convolutional recurrent neural network
TLDR
This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection and shows that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multich channel audio better when they are presented as separate layers of a volume.
BIDIRECTIONAL GRU FOR SOUND EVENT DETECTION
TLDR
This paper presents a multi label bi-directional recurrent neural network to model the temporal evolution of sound events, and explores data augmentation techniques that have shown success in sound classification.
Polyphonic sound event detection using multi label deep neural networks
TLDR
Frame-wise spectral-domain features are used as inputs to train a deep neural network for multi label classification in this work and the proposed method improves the accuracy by 19% percentage points overall.
Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features
TLDR
The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED.
Duration-Controlled LSTM for Polyphonic Sound Event Detection
TLDR
This paper builds upon a state-of-the-art SED method that performs frame-by-frame detection using a bidirectional LSTM recurrent neural network, and incorporates a duration-controlled modeling technique based on a hidden semi-Markov model that makes it possible to model the duration of each sound event precisely and to perform sequence- by-sequence detection without having to resort to thresholding.
Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features
TLDR
The proposed method learns to recognize overlapping sound events from multichannel features faster and performs better SED with a fewer number of training epochs.
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
TLDR
This work combines these two approaches in a convolutional recurrent neural network (CRNN) and applies it on a polyphonic sound event detection task and observes a considerable improvement for four different datasets consisting of everyday sound events.
Robust sound event recognition using convolutional neural networks
TLDR
This work proposes novel features derived from spectrogram energy triggering, allied with the powerful classification capabilities of a convolutional neural network (CNN), which demonstrates excellent performance under noise-corrupted conditions when compared against state-of-the-art approaches on standard evaluation tasks.
Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network
TLDR
The results show that the proposed DOAnet is capable of estimating the number of sources and their respective DOAs with good precision and generate SPS with high signal-to-noise ratio.
Recurrent neural networks for polyphonic sound event detection in real life recordings
In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single
...
...