TWO-STAGE SOUND EVENT LOCALIZATION AND DETECTION USING INTENSITY VECTOR AND GENERALIZED CROSS-CORRELATION Technical Report
@inproceedings{Cao2019TWOSTAGESE, title={TWO-STAGE SOUND EVENT LOCALIZATION AND DETECTION USING INTENSITY VECTOR AND GENERALIZED CROSS-CORRELATION Technical Report}, author={Yin Cao and Turab Iqbal and Qiuqiang Kong and Miguel Galindo and Wenwu Wang and Mark D. Plumbley}, year={2019} }
Sound event localization and detection (SELD) refers to the spatial and temporal localization of sound events in addition to classification. The Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 3 introduces a strongly labelled dataset to address this problem. In this report, a two-stage polyphonic sound event detection and localization method. The method utilizes log mel features for event detection, and uses intensity vector and GCC features for localization…
20 Citations
SOUND EVENT DETECTION AND LOCALIZATION USING CRNN MODELS Technical Report
- Computer Science, Physics
- 2020
The Convolutional Recurrent Neural Network (CRNN) is developed that jointly predicts the Sound Event Detection (SED) and Degree of Arrival (DOA) hence minimizing the overlapping problems.
A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
A two-step approach that decouples the learning of the sound event detection and directional-of-arrival estimation systems is proposed, which allows the flexibility in the system design, and increases the performance of the whole sound event localization and detection system.
SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection
- Physics, Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2022
A novel feature called Spatial cue-Augmented Log-SpectrogrAm (SALSA) with exact time-frequency mapping between the signal power and the source directional cues, which is crucial for resolving overlapping sound sources is proposed.
Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning
- Computer ScienceINTERSPEECH
- 2020
A new SELD method based on multiple direction of arrival (DOA) beamforming and multi-task learning, which achieves the state-of-art performance of DCASE2019 SELD, is proposed.
An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection
- Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
The proposed EINV2 for joint SED and DoA estimation outperforms previous methods by a large margin, and has comparable performance to state-of-the-art ensemble models.
Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs
- Computer ScienceArXiv
- 2020
This work aims to improve the accuracy results of the baseline CRNN presented in DCASE 2020 Task 3 by adding residual squeeze-excitation blocks in the convolutional part of the CRNN.
Sound Event Localization Based on Sound Intensity Vector Refined by Dnn-Based Denoising and Source Separation
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
The sound intensity vectors (IVs) for physics-based DOA estimation is refined based on DNN-based denoising and source separation, and this method enables the accurateDOA estimation for both single and overlapping sources using a spherical microphone array.
Sound Event Localization and Detection using CRNN Architecture with Mixup for Model Generalization
- Computer ScienceDCASE
- 2019
The proposed architecture is based on Convolutional-Recurrent Neural Network (CRNN) architecture and introduced rectangular kernels in the pooling layers to minimize the information loss in temporal dimension within the CNN module, leading to boosting up the RNN module performance.
Sound source detection, localization and classification using consecutive ensemble of CRNN models
- Computer ScienceDCASE
- 2019
This paper uses four CRNN SELDnet-like single output models which run in a consecutive manner to recover all possible information of occurring events to decompose the SELD task into estimating number of active sources, estimating direction of arrival of a single source, estimating destination of the second source where the direction of the first one is known and a multi-label classification task.
TASK 3 DCASE 2020: SOUND EVENT LOCALIZATION AND DETECTION USING RESIDUAL SQUEEZE-EXCITATION CNNS Technical Report
- Computer Science
- 2020
This work aims to improve the accuracy results of the baseline CRNN by adding residual squeeze-excitation blocks in the convolutional part of the CRNN, and shows that by simply introducing the residual SE blocks, the results obtained in the development phase clearly exceed the baseline.
References
SHOWING 1-10 OF 13 REFERENCES
Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy
- Computer Science, EngineeringDCASE
- 2019
Experimental results show that the proposed two-stage polyphonic sound event detection and localization method is able to improve the performance of both SED and DOAE, and also performs significantly better than the baseline method.
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
- Computer ScienceIEEE Journal of Selected Topics in Signal Processing
- 2019
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems
- Computer ScienceArXiv
- 2019
This paper proposes generic cross-task baseline systems based on convolutional neural networks (CNNs) and finds that the 9-layer CNN with average pooling is a good model for a majority of the DCASE 2019 tasks.
Metrics for Polyphonic Sound Event Detection
- Computer Science
- 2016
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources…
Deep Neural Networks for Multiple Speaker Detection and Localization
- Computer Science2018 IEEE International Conference on Robotics and Automation (ICRA)
- 2018
This paper proposes a likelihood-based encoding of the network output, which naturally allows the detection of an arbitrary number of sources, and investigates the use of sub-band cross-correlation information as features for better localization in sound mixtures.
A neural network based algorithm for speaker localization in a multi-room environment
- Computer Science2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)
- 2016
A Speaker Localization algorithm based on Neural Networks for multi-room domestic scenarios is proposed and outperforms the reference one, providing an average localization error, expressed in terms of RMSE, equal to 525 mm against 1465 mm.
A learning-based approach to direction of arrival estimation in noisy and reverberant environments
- Computer Science2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2015
A learning-based approach that can learn from a large amount of simulated noisy and reverberant microphone array inputs for robust DOA estimation and uses a multilayer perceptron neural network to learn the nonlinear mapping from such features to the DOA.
Indoor Sound Source Localization With Probabilistic Neural Network
- Computer ScienceIEEE Transactions on Industrial Electronics
- 2018
Results show that the proposed GCA can localize accurately and robustly for diverse indoor applications where the site acoustic features can be studied prior to the localization stage, and has increased the success rate on direction of arrival estimation significantly with good robustness to environmental changes.
The generalized correlation method for estimation of time delay
- Engineering
- 1976
A maximum likelihood (ML) estimator is developed for determining time delay between signals received at two spatially separated sensors in the presence of uncorrelated noise. This ML estimator can be…
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Computer ScienceICML
- 2015
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.