• Corpus ID: 220040070

TWO-STAGE SOUND EVENT LOCALIZATION AND DETECTION USING INTENSITY VECTOR AND GENERALIZED CROSS-CORRELATION Technical Report

@inproceedings{Cao2019TWOSTAGESE,
  title={TWO-STAGE SOUND EVENT LOCALIZATION AND DETECTION USING INTENSITY VECTOR AND GENERALIZED CROSS-CORRELATION Technical Report},
  author={Yin Cao and Turab Iqbal and Qiuqiang Kong and Miguel Galindo and Wenwu Wang and Mark D. Plumbley},
  year={2019}
}
Sound event localization and detection (SELD) refers to the spatial and temporal localization of sound events in addition to classification. The Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 3 introduces a strongly labelled dataset to address this problem. In this report, a two-stage polyphonic sound event detection and localization method. The method utilizes log mel features for event detection, and uses intensity vector and GCC features for localization… 

Figures and Tables from this paper

SOUND EVENT DETECTION AND LOCALIZATION USING CRNN MODELS Technical Report
TLDR
The Convolutional Recurrent Neural Network (CRNN) is developed that jointly predicts the Sound Event Detection (SED) and Degree of Arrival (DOA) hence minimizing the overlapping problems.
A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
TLDR
A two-step approach that decouples the learning of the sound event detection and directional-of-arrival estimation systems is proposed, which allows the flexibility in the system design, and increases the performance of the whole sound event localization and detection system.
SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection
TLDR
A novel feature called Spatial cue-Augmented Log-SpectrogrAm (SALSA) with exact time-frequency mapping between the signal power and the source directional cues, which is crucial for resolving overlapping sound sources is proposed.
Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning
TLDR
A new SELD method based on multiple direction of arrival (DOA) beamforming and multi-task learning, which achieves the state-of-art performance of DCASE2019 SELD, is proposed.
An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection
TLDR
The proposed EINV2 for joint SED and DoA estimation outperforms previous methods by a large margin, and has comparable performance to state-of-the-art ensemble models.
Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs
TLDR
This work aims to improve the accuracy results of the baseline CRNN presented in DCASE 2020 Task 3 by adding residual squeeze-excitation blocks in the convolutional part of the CRNN.
Sound Event Localization Based on Sound Intensity Vector Refined by Dnn-Based Denoising and Source Separation
TLDR
The sound intensity vectors (IVs) for physics-based DOA estimation is refined based on DNN-based denoising and source separation, and this method enables the accurateDOA estimation for both single and overlapping sources using a spherical microphone array.
Sound Event Localization and Detection using CRNN Architecture with Mixup for Model Generalization
TLDR
The proposed architecture is based on Convolutional-Recurrent Neural Network (CRNN) architecture and introduced rectangular kernels in the pooling layers to minimize the information loss in temporal dimension within the CNN module, leading to boosting up the RNN module performance.
Sound source detection, localization and classification using consecutive ensemble of CRNN models
TLDR
This paper uses four CRNN SELDnet-like single output models which run in a consecutive manner to recover all possible information of occurring events to decompose the SELD task into estimating number of active sources, estimating direction of arrival of a single source, estimating destination of the second source where the direction of the first one is known and a multi-label classification task.
TASK 3 DCASE 2020: SOUND EVENT LOCALIZATION AND DETECTION USING RESIDUAL SQUEEZE-EXCITATION CNNS Technical Report
TLDR
This work aims to improve the accuracy results of the baseline CRNN by adding residual squeeze-excitation blocks in the convolutional part of the CRNN, and shows that by simply introducing the residual SE blocks, the results obtained in the development phase clearly exceed the baseline.
...
...

References

SHOWING 1-10 OF 13 REFERENCES
Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy
TLDR
Experimental results show that the proposed two-stage polyphonic sound event detection and localization method is able to improve the performance of both SED and DOAE, and also performs significantly better than the baseline method.
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
TLDR
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems
TLDR
This paper proposes generic cross-task baseline systems based on convolutional neural networks (CNNs) and finds that the 9-layer CNN with average pooling is a good model for a majority of the DCASE 2019 tasks.
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
Deep Neural Networks for Multiple Speaker Detection and Localization
TLDR
This paper proposes a likelihood-based encoding of the network output, which naturally allows the detection of an arbitrary number of sources, and investigates the use of sub-band cross-correlation information as features for better localization in sound mixtures.
A neural network based algorithm for speaker localization in a multi-room environment
TLDR
A Speaker Localization algorithm based on Neural Networks for multi-room domestic scenarios is proposed and outperforms the reference one, providing an average localization error, expressed in terms of RMSE, equal to 525 mm against 1465 mm.
A learning-based approach to direction of arrival estimation in noisy and reverberant environments
TLDR
A learning-based approach that can learn from a large amount of simulated noisy and reverberant microphone array inputs for robust DOA estimation and uses a multilayer perceptron neural network to learn the nonlinear mapping from such features to the DOA.
Indoor Sound Source Localization With Probabilistic Neural Network
TLDR
Results show that the proposed GCA can localize accurately and robustly for diverse indoor applications where the site acoustic features can be studied prior to the localization stage, and has increased the success rate on direction of arrival estimation significantly with good robustness to environmental changes.
The generalized correlation method for estimation of time delay
A maximum likelihood (ML) estimator is developed for determining time delay between signals received at two spatially separated sensors in the presence of uncorrelated noise. This ML estimator can be
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
...
...