• Corpus ID: 219260243

A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection

@article{Politis2020ADO,
  title={A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection},
  author={Archontis Politis and Sharath Adavanne and Tuomas Virtanen},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.01919}
}
This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge. The SELD task refers to the problem of trying to simultaneously classify a known set of sound event classes, detect their temporal activations, and estimate their spatial directions or locations while they are active. To train and test SELD systems, datasets of diverse sound events occurring under realistic acoustic conditions are needed. Compared to… 

Figures and Tables from this paper

SOUND EVENT DETECTION AND LOCALIZATION USING CRNN MODELS Technical Report
TLDR
The Convolutional Recurrent Neural Network (CRNN) is developed that jointly predicts the Sound Event Detection (SED) and Degree of Arrival (DOA) hence minimizing the overlapping problems.
Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments
Our goal is to develop a sound event localization and detection (SELD) system that works robustly in unknown environments. A SELD system trained on known environment data is degraded in an unknown
SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection
TLDR
A novel feature called Spatial cue-Augmented Log-SpectrogrAm (SALSA) with exact time-frequency mapping between the signal power and the source directional cues, which is crucial for resolving overlapping sound sources is proposed.
A COMBINATION OF VARIOUS NEURAL NETWORKS FOR SOUND EVENT LOCALIZATION AND DETECTION Technical Report
TLDR
A network architecture is proposed, a combination of various network layers, which can yield the optimal performance for the SELD task, and which augmentation techniques to use to boost the performance of the proposed model with a limited train dataset.
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection
TLDR
A novel four-stage data augmentation approach to ResNet-Conformer based acoustic modeling for sound event localization and detection (SELD) that employs a ResNetConformer architecture to model both global and local context dependencies of an audio sequence to yield further gains over those architectures used in the DCASE 2020 SELD evaluations.
Accdoa: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization And Detection
TLDR
In experimental evaluations with the DCASE 2020 Task 3 dataset, the ACCDOA representation outperformed the two-branch representation in SELD metrics with a smaller network size and performed better than state-of-the-art SELD systems in terms of localization and location-dependent detection.
LOCALIZATION AND DETECTION FOR MOVING SOUND SOURCES USING CONSECUTIVE ENSEMBLE OF 2D-CRNN Technical Report
TLDR
This technical report introduces a deep learning strategy for sound event localization and detection in DCASE 2020 Task 3 to get accurate estimation of both detecting and localizing moving sound events by splitting a task into five sub-tasks.
A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection
TLDR
To investigate the individual and combined effects of ambient noise, interferers, and reverberation, the performance of the baseline on different versions of the dataset excluding or including combinations of these factors indicates that by far the most detrimental effects are caused by directional interferers.
Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head
TLDR
A dataset named Wearable SELD dataset is proposed, which consists of data recorded by 24 microphones placed on a head and torso simulators (HATS) with some accessories mimicking wearable devices (glasses, earphones, and headphones).
What Makes Sound Event Localization and Detection Difficult? Insights from Error Analysis
TLDR
Experimental results indicate polyphony as the main challenge in SELD, due to the difference inulty in detecting all sound events of interest, and the SELD systems tend to make fewer errors for the polyphonic scenario that is dominant in the training set.
...
...

References

SHOWING 1-10 OF 28 REFERENCES
Sound Event Detection in the DCASE 2017 Challenge
TLDR
Analysis of the systems behavior reveals that task-specific optimization has a big role in producing good performance; however, often this optimization closely follows the ranking metric, and its maximization/minimization does not result in universally good performance.
Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy
TLDR
Experimental results show that the proposed two-stage polyphonic sound event detection and localization method is able to improve the performance of both SED and DOAE, and also performs significantly better than the baseline method.
A multi-room reverberant dataset for sound event localization and detection
TLDR
This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge to detect the temporal activities of a known set of sound event classes, and further localize them in space when active.
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
TLDR
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
Joint Measurement of Localization and Detection of Sound Events
TLDR
This paper proposes augmentation of the localization metrics with a condition related to the detection, and conversely, use of location information in calculating the true positives for detection.
Hierarchical Detection of Sound Events and their Localization Using Convolutional Neural Networks with Adaptive Thresholds
TLDR
The system is based on multi-channel convolutional neural networks, combined with data augmentation and ensembling, and follows a hierarchical approach that first determines adaptive thresholds for the multi-label sound event detection (SED) problem, based on a CNN operating on spectrograms over longduration windows.
Sound Event Detection and Direction of Arrival Estimation using Residual Net and Recurrent Neural Networks
TLDR
Deep residual nets originally used for image classification are adapted and combined with recurrent neural networks to estimate the onset-offset of sound events, sound events class, and their direction in a reverberant environment to improve the system performance on unseen data.
Sound Event Localization and Detection using CRNN Architecture with Mixup for Model Generalization
TLDR
The proposed architecture is based on Convolutional-Recurrent Neural Network (CRNN) architecture and introduced rectangular kernels in the pooling layers to minimize the information loss in temporal dimension within the CNN module, leading to boosting up the RNN module performance.
GCC-PHAT Cross-Correlation Audio Features for Simultaneous Sound Event Localization and Detection (SELD) on Multiple Rooms
In this work, we show a simultaneous sound event localization and detection (SELD) system, with enhanced acoustic features, in which we propose using the well-known Generalized Cross Correlation
...
...