• Corpus ID: 220040070

TWO-STAGE SOUND EVENT LOCALIZATION AND DETECTION USING INTENSITY VECTOR AND GENERALIZED CROSS-CORRELATION Technical Report

@inproceedings{Cao2019TWOSTAGESE,
  title={TWO-STAGE SOUND EVENT LOCALIZATION AND DETECTION USING INTENSITY VECTOR AND GENERALIZED CROSS-CORRELATION Technical Report},
  author={Yin Cao and Turab Iqbal and Qiuqiang Kong and Miguel Galindo and Wenwu Wang and Mark D. Plumbley},
  year={2019}
}
Sound event localization and detection (SELD) refers to the spatial and temporal localization of sound events in addition to classification. The Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 3 introduces a strongly labelled dataset to address this problem. In this report, a two-stage polyphonic sound event detection and localization method. The method utilizes log mel features for event detection, and uses intensity vector and GCC features for localization… 

Figures and Tables from this paper

SOUND EVENT DETECTION AND LOCALIZATION USING CRNN MODELS Technical Report

The Convolutional Recurrent Neural Network (CRNN) is developed that jointly predicts the Sound Event Detection (SED) and Degree of Arrival (DOA) hence minimizing the overlapping problems.

A Sequence Matching Network for Polyphonic Sound Event Localization and Detection

A two-step approach that decouples the learning of the sound event detection and directional-of-arrival estimation systems is proposed, which allows the flexibility in the system design, and increases the performance of the whole sound event localization and detection system.

SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection

A novel feature called Spatial cue-Augmented Log-SpectrogrAm (SALSA) with exact time-frequency mapping between the signal power and the source directional cues, which is crucial for resolving overlapping sound sources is proposed.

Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning

A new SELD method based on multiple direction of arrival (DOA) beamforming and multi-task learning, which achieves the state-of-art performance of DCASE2019 SELD, is proposed.

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

The proposed EINV2 for joint SED and DoA estimation outperforms previous methods by a large margin, and has comparable performance to state-of-the-art ensemble models.

Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs

This work aims to improve the accuracy results of the baseline CRNN presented in DCASE 2020 Task 3 by adding residual squeeze-excitation blocks in the convolutional part of the CRNN.

Sound Event Localization Based on Sound Intensity Vector Refined by Dnn-Based Denoising and Source Separation

The sound intensity vectors (IVs) for physics-based DOA estimation is refined based on DNN-based denoising and source separation, and this method enables the accurateDOA estimation for both single and overlapping sources using a spherical microphone array.

Sound Event Localization and Detection using CRNN Architecture with Mixup for Model Generalization

The proposed architecture is based on Convolutional-Recurrent Neural Network (CRNN) architecture and introduced rectangular kernels in the pooling layers to minimize the information loss in temporal dimension within the CNN module, leading to boosting up the RNN module performance.

Sound source detection, localization and classification using consecutive ensemble of CRNN models

This paper uses four CRNN SELDnet-like single output models which run in a consecutive manner to recover all possible information of occurring events to decompose the SELD task into estimating number of active sources, estimating direction of arrival of a single source, estimating destination of the second source where the direction of the first one is known and a multi-label classification task.

TASK 3 DCASE 2020: SOUND EVENT LOCALIZATION AND DETECTION USING RESIDUAL SQUEEZE-EXCITATION CNNS Technical Report

This work aims to improve the accuracy results of the baseline CRNN by adding residual squeeze-excitation blocks in the convolutional part of the CRNN, and shows that by simply introducing the residual SE blocks, the results obtained in the development phase clearly exceed the baseline.

References

SHOWING 1-10 OF 13 REFERENCES

Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

Experimental results show that the proposed two-stage polyphonic sound event detection and localization method is able to improve the performance of both SED and DOAE, and also performs significantly better than the baseline method.

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.

Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems

This paper proposes generic cross-task baseline systems based on convolutional neural networks (CNNs) and finds that the 9-layer CNN with average pooling is a good model for a majority of the DCASE 2019 tasks.

Metrics for Polyphonic Sound Event Detection

This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources

Deep Neural Networks for Multiple Speaker Detection and Localization

This paper proposes a likelihood-based encoding of the network output, which naturally allows the detection of an arbitrary number of sources, and investigates the use of sub-band cross-correlation information as features for better localization in sound mixtures.

A neural network based algorithm for speaker localization in a multi-room environment

A Speaker Localization algorithm based on Neural Networks for multi-room domestic scenarios is proposed and outperforms the reference one, providing an average localization error, expressed in terms of RMSE, equal to 525 mm against 1465 mm.

Indoor Sound Source Localization With Probabilistic Neural Network

Results show that the proposed GCA can localize accurately and robustly for diverse indoor applications where the site acoustic features can be studied prior to the localization stage, and has increased the success rate on direction of arrival estimation significantly with good robustness to environmental changes.

Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks

It is shown that CNNs operating on cepstrogram and generalized cross-correlogram inputs are able to estimate more reliably the instantaneous range and bearing of transiting motor vessels when the source localization performance of conventional passive ranging methods is degraded.

The generalized correlation method for estimation of time delay

A maximum likelihood (ML) estimator is developed for determining time delay between signals received at two spatially separated sensors in the presence of uncorrelated noise. This ML estimator can be

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.