• Corpus ID: 235422474

A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection

@inproceedings{Politis2021ADO,
  title={A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection},
  author={Archontis Politis and Sharath Adavanne and Daniel Krause and Antoine Deleforge and Prerak Srivastava and Tuomas Virtanen},
  booktitle={DCASE},
  year={2021}
}
This report presents the dataset and baseline of Task 3 of the DCASE2021 Challenge on Sound Event Localization and Detection (SELD). The dataset is based on emulation of real recordings of static or moving sound events under real conditions of reverberation and ambient noise, using spatial room impulse responses captured in a variety of rooms and delivered in two spatial formats. The acoustical synthesis remains the same as in the previous iteration of the challenge, however the new dataset… 

Figures and Tables from this paper

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
TLDR
Results of the baseline indicate that with a suitable training strategy a reasonable detection and localization performance can be achieved on real sound scene recordings.
Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection
TLDR
An impulse response simulation framework (IRS) that augments spatial characteristics using simulated room impulse responses (RIR) and an ablation study to discuss the contribution and need for each component within the IRS.
Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection
TLDR
Spatial Mixup is proposed, as an application of parametric spatial audio effects for data augmentation, which modifies the directional properties of a multi-channel spatial audio signal encoded in the ambisonics domain, enabling deep learning models to achieve invariance to small spatial perturbations.
DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection
TLDR
This work proposes a novel feature called spatial cue-augmented log-spectrogram (SALSA) with exact time-frequency mapping between the signal power and the source direction-of-arrival, and combined several models with slightly different architectures that were trained on the new feature to further improve the system performances for the DCASE sound event localization and detection challenge.
Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes
TLDR
This work applied Squeeze-and-Excitation block on channel and frequency dimensions to efficiently extract feature characteristics and proposes original data augmentation method named Moderate Mixup in order to simulate situations where noise floor or interfering events exist.
Assessment of Self-Attention on Learned Features For Sound Event Localization and Detection
TLDR
The effect of MHSA on the SELD task is studied in detail, including the effects of replacing the RNN blocks with self-attention layers, and the effect of position embeddings and layer normalization.
SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone Arrays
TLDR
Experimental results on the TAU-NIGENS Spatial Sound Events 2021 dataset showed that the SALSA-Lite feature achieved competitive performance compared to the full SALSA feature, and significantly outperformed the traditional feature set of multichannel log-mel spectrograms with generalized cross-correlation spectra.
SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection
TLDR
A novel feature called Spatial cue-Augmented Log-SpectrogrAm (SALSA) with exact time-frequency mapping between the signal power and the source directional cues, which is crucial for resolving overlapping sound sources is proposed.
What Makes Sound Event Localization and Detection Difficult? Insights from Error Analysis
TLDR
Experimental results indicate polyphony as the main challenge in SELD, due to the difference inulty in detecting all sound events of interest, and the SELD systems tend to make fewer errors for the polyphonic scenario that is dominant in the training set.
A Method Based on Dual Cross-Modal Attention and Parameter Sharing for Polyphonic Sound Event Localization and Detection
TLDR
Experimental results demonstrate that the efficient model using one common decoder block based on the DCMA to predict multiple events in the track-wise output format is effective for the SELD task with up to three overlapping events.
...
...

References

SHOWING 1-10 OF 27 REFERENCES
A multi-room reverberant dataset for sound event localization and detection
TLDR
This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge to detect the temporal activities of a known set of sound event classes, and further localize them in space when active.
The LOCATA Challenge: Acoustic Source Localization and Tracking
TLDR
A review of relevant localization and tracking algorithms and, within the context of the existing literature, a detailed evaluation and dissemination of the LOCATA submissions are provided.
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection
TLDR
A novel four-stage data augmentation approach to ResNet-Conformer based acoustic modeling for sound event localization and detection (SELD) that employs a ResNetConformer architecture to model both global and local context dependencies of an audio sequence to yield further gains over those architectures used in the DCASE 2020 SELD evaluations.
Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019
TLDR
An overview of the first international evaluation on sound event localization and detection, organized as a task of the DCASE 2019 Challenge, presents in detail how the systems were evaluated and ranked and the characteristics of the best-performing systems.
Ensemble of Sequence Matching Networks for Dynamic Sound Event Localization, Detection, and Tracking
TLDR
In order to estimate directions-of-arrival of moving sound sources with higher required spatial resolutions than those of static sources, this work proposes to separate the directional estimates into azimuth and elevation estimates before passing them to the sequence matching network.
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
TLDR
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
Scream and gunshot detection and localization for audio-surveillance systems
This paper describes an audio-based video surveillance system which automatically detects anomalous audio events in a public square, such as screams or gunshots, and localizes the position of the
The NIGENS General Sound Events Database
TLDR
NIGENS is released and presented, a database with 714 wav files containing isolated high quality sound events of 14 different types, plus 303 `general' wAV files of anything else but these 14 types.
On Multitask Loss Function for Audio Event Detection and Localization
TLDR
This work proposes a multitask regression model, in which both (multi-label) event detection and localization are formulated as regression problems and use the mean squared error loss homogeneously for model training.
A General Network Architecture for Sound Event Localization and Detection Using Transfer Learning and Recurrent Neural Network
TLDR
The experimental results using the DCASE 2020 SELD dataset show that the performances of the proposed network architecture using different SED and DOA estimation algorithms and different audio formats are competitive with other state-of-the-art SELD algorithms.
...
...