A multi-room reverberant dataset for sound event localization and detection

@inproceedings{Adavanne2019AMR,
  title={A multi-room reverberant dataset for sound event localization and detection},
  author={Sharath Adavanne and Archontis Politis and Tuomas Virtanen},
  booktitle={DCASE},
  year={2019}
}
This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge. [] Key Method These sound events are spatialized using real-life impulse responses collected at multiple spatial coordinates in five different rooms with varying dimensions and material properties. A baseline SELD method employing a convolutional recurrent neural network is used to generate benchmark scores for this reverberant dataset. The benchmark scores are obtained using the recommended cross…

Figures and Tables from this paper

A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection
TLDR
To investigate the individual and combined effects of ambient noise, interferers, and reverberation, the performance of the baseline on different versions of the dataset excluding or including combinations of these factors indicates that by far the most detrimental effects are caused by directional interferers.
SOUND EVENT DETECTION AND LOCALIZATION USING CRNN MODELS Technical Report
TLDR
The Convolutional Recurrent Neural Network (CRNN) is developed that jointly predicts the Sound Event Detection (SED) and Degree of Arrival (DOA) hence minimizing the overlapping problems.
SECL-UMons Database for Sound Event Classification and Localization
TLDR
The DCASE 2019 challenge baseline (SELDnet) employing a convolutional recurrent neural network is used to generate benchmark scores for the new SECL-UMons dataset for sound event classification and localization in the context of office environments.
SOUND EVENT LOCALIZATION AND DETECTION USING FOA DOMAIN SPATIAL AUGMENTATION Technical Report
TLDR
The proposed spatial augmentation enables the system participating to the DCASE 2019, Task 3: Sound Event Localization and Detection challenge to augment direction of arrival (DOA) labels without losing physical relationships between steering vectors and observations.
Sound source detection, localization and classification using consecutive ensemble of CRNN models
TLDR
This paper uses four CRNN SELDnet-like single output models which run in a consecutive manner to recover all possible information of occurring events to decompose the SELD task into estimating number of active sources, estimating direction of arrival of a single source, estimating destination of the second source where the direction of the first one is known and a multi-label classification task.
Metric optimization for Sound Event Localization and Detection
TLDR
Three methods are proposed: soft f-loss with temporal masking, periodic loss, and PoolNet-based architecture to handle three issues of problem with dataset imbalance, pooling size decision, and periodicity of angles.
Sound Event Localization and Detection Using CRNN on Pairs of Microphones
TLDR
This paper proposes sound event localization and detection methods from multichannel recording based on two Convolutional Recurrent Neural Networks to perform sound event detection (SED) and time difference of arrival (TDOA) estimation on each pair of microphones in a microphone array.
A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection
TLDR
A trackwise ensemble event independent network with a novel data augmentation method based on the previous proposed Event-Independent Network V2 and extended by conformer blocks and dense blocks is proposed to solve an ensemble model problem for track-wise output format that track permutation may occur among different models.
DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection
TLDR
This work proposes a novel feature called spatial cue-augmented log-spectrogram (SALSA) with exact time-frequency mapping between the signal power and the source direction-of-arrival, and combined several models with slightly different architectures that were trained on the new feature to further improve the system performances for the DCASE sound event localization and detection challenge.
A Hybrid Parametric-Deep Learning Approach for Sound Event Localization and Detection
TLDR
The proposed methodology relies on parametric spatial audio analysis for source localization and detection, combined with a deep learning-based monophonic event classifier, to reduce the localization error on the evaluation dataset.
...
...

References

SHOWING 1-10 OF 10 REFERENCES
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
TLDR
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
Sound-model-based acoustic source localization using distributed microphone arrays
TLDR
A new source localization technique is proposed that works jointly with an acoustic event detection system and it seems that the proposed model-based approach can be an alternative to current techniques for event-based localization.
Two-source acoustic event detection and localization: Online implementation in a Smart-room
TLDR
This work implemented online 2-source acoustic event detection and localization algorithms in a Smart-room, a closed space equipped with multiple microphones, showing high recognition accuracy for most of acoustic events both isolated and overlapped with speech.
Detection, classification and localization of acoustic events in the presence of background noise for acoustic surveillance of hazardous situations
TLDR
It is found that the engineered algorithms provide a sufficient robustness in moderately intense noise in order to be applied to practical audio-visual surveillance systems.
Comparing modeled and measurement-based spherical harmonic encoding filters for spherical microphone arrays
  • A. PolitisH. Gamper
  • Engineering, Physics
    2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
  • 2017
TLDR
A flexible filter design approach is presented that combines the benefits of previous methods and is suitable for deriving both modeled and measurement-based filters.
Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement
TLDR
It is shown by experiment that all but one of these computation methods leads to biased measurements, especially under high class imbalance, which is of particular interest to those designing machine learning software libraries and researchers focused onhigh class imbalance.
Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network
TLDR
The results show that the proposed DOAnet is capable of estimating the number of sources and their respective DOAs with good precision and generate SPS with high signal-to-noise ratio.
Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network , ” in European Signal Processing Conference , 2018 . 14
  • Applied Sciences