Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

@inproceedings{Cao2019PolyphonicSE,
  title={Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy},
  author={Yin Cao and Qiuqiang Kong and Turab Iqbal and Fengyan An and Wenwu Wang and Mark D. Plumbley},
  booktitle={DCASE},
  year={2019}
}
Sound event detection (SED) and localization refer to recognizing sound events and estimating their spatial and temporal locations. [] Key Method The method learns SED first, after which the learned feature layers are transferred for DOAE. It then uses the SED ground truth as a mask to train DOAE. The proposed method is evaluated on the DCASE 2019 Task 3 dataset, which contains different overlapping sound events in different environments. Experimental results show that the proposed method is able to improve…

Figures and Tables from this paper

TWO-STAGE SOUND EVENT LOCALIZATION AND DETECTION USING INTENSITY VECTOR AND GENERALIZED CROSS-CORRELATION Technical Report
TLDR
A two-stage polyphonic sound event detection and localization method that is able to localize and detect overlapping sound events in different environments, and can improve the performance of both SED and DOA estimation, and performs significantly better than the baseline method.
U Recurrent Neural Network for Polyphonic Sound Event Detection and Localization
TLDR
A novel model called U recurrent neural network (URNN) is proposed to alleviate problems of polyphonic sound event detection and localization, it combines the low-level and high-level features in the model without significantly increasing computation costs, and exploits the identity layer to make the network deeper.
An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection
TLDR
The proposed EINV2 for joint SED and DoA estimation outperforms previous methods by a large margin, and has comparable performance to state-of-the-art ensemble models.
Sound event localization and detection based on crnn using rectangular filters and channel rotation data augmentation
TLDR
The proposed system is a convolutional recurrent neural network using rectangular filters specialized in recognizing significant spectral features related to the task, considerably improving Error Rate and F-score for location-aware detection.
A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
TLDR
A two-step approach that decouples the learning of the sound event detection and directional-of-arrival estimation systems is proposed, which allows the flexibility in the system design, and increases the performance of the whole sound event localization and detection system.
A General Network Architecture for Sound Event Localization and Detection Using Transfer Learning and Recurrent Neural Network
TLDR
The experimental results using the DCASE 2020 SELD dataset show that the performances of the proposed network architecture using different SED and DOA estimation algorithms and different audio formats are competitive with other state-of-the-art SELD algorithms.
Ensemble of Sequence Matching Networks for Dynamic Sound Event Localization, Detection, and Tracking
TLDR
In order to estimate directions-of-arrival of moving sound sources with higher required spatial resolutions than those of static sources, this work proposes to separate the directional estimates into azimuth and elevation estimates before passing them to the sequence matching network.
Exploring Detection and Localization of Overlapping Sound Sources with Deep Learning
TLDR
This project has been submitted as possible solution to the Task 3, focused on sound event localization and detection, and has been evaluated using the same dataset provided for the DCASE 2020 Challenge Task 3: TAU-NIGENS Spatial Sound Events 2020.
A two-step system for sound event localization and detection
TLDR
A two-step system to do sound event localization and detection that combines the results of the event detector and direction-of-arrival estimator together and shows a significant improvement over the baseline solution in DCASE 2019 task 3 challenge.
DCASE 2020 TASK 3: ENSEMBLE OF SEQUENCE MATCHING NETWORKS FOR DYNAMIC SOUND EVENT LOCALIZATION, DETECTION, AND TRACKING Technical Report
TLDR
In order to estimate directions-of-arrival of moving sound sources with high spatial resolution, it is proposed to separate the directional estimations into azimuth and elevation before passing them to the sequence matching network.
...
...

References

SHOWING 1-10 OF 29 REFERENCES
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
TLDR
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems
TLDR
This paper proposes generic cross-task baseline systems based on convolutional neural networks (CNNs) and finds that the 9-layer CNN with average pooling is a good model for a majority of the DCASE 2019 tasks.
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
Polyphonic sound event detection using multi label deep neural networks
TLDR
Frame-wise spectral-domain features are used as inputs to train a deep neural network for multi label classification in this work and the proposed method improves the accuracy by 19% percentage points overall.
Robust sound event recognition using convolutional neural networks
TLDR
This work proposes novel features derived from spectrogram energy triggering, allied with the powerful classification capabilities of a convolutional neural network (CNN), which demonstrates excellent performance under noise-corrupted conditions when compared against state-of-the-art approaches on standard evaluation tasks.
Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge
TLDR
The emergence of deep learning as the most popular classification method is observed, replacing the traditional approaches based on Gaussian mixture models and support vector machines.
Deep Neural Networks for Multiple Speaker Detection and Localization
TLDR
This paper proposes a likelihood-based encoding of the network output, which naturally allows the detection of an arbitrary number of sources, and investigates the use of sub-band cross-correlation information as features for better localization in sound mixtures.
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
TLDR
This work combines these two approaches in a convolutional recurrent neural network (CRNN) and applies it on a polyphonic sound event detection task and observes a considerable improvement for four different datasets consisting of everyday sound events.
Recurrent neural networks for polyphonic sound event detection in real life recordings
In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single
Computational Analysis of Sound Scenes and Events
TLDR
This book presents computational methods for extracting the useful information from audio signals, collecting the state of the art in the field of sound event and scene analysis, and gives an overview of methods for computational analysis of sounds scenes and events.
...
...