A Sequence Matching Network for Polyphonic Sound Event Localization and Detection

@article{Nguyen2020ASM,
  title={A Sequence Matching Network for Polyphonic Sound Event Localization and Detection},
  author={Thi Ngoc Tho Nguyen and Douglas L. Jones and Woonseng Gan},
  journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2020},
  pages={71-75}
}
Polyphonic sound event detection and direction-of-arrival estimation require different input features from audio signals. While sound event detection mainly relies on time-frequency patterns, direction-of-arrival estimation relies on magnitude or phase differences between microphones. Previous approaches use the same input features for sound event detection and direction-of-arrival estimation, and train the two tasks jointly or in a two-stage transfer-learning manner. We propose a two-step… 

Figures and Tables from this paper

DCASE 2020 TASK 3: ENSEMBLE OF SEQUENCE MATCHING NETWORKS FOR DYNAMIC SOUND EVENT LOCALIZATION, DETECTION, AND TRACKING Technical Report

TLDR
In order to estimate directions-of-arrival of moving sound sources with high spatial resolution, it is proposed to separate the directional estimations into azimuth and elevation before passing them to the sequence matching network.

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

TLDR
The proposed EINV2 for joint SED and DoA estimation outperforms previous methods by a large margin, and has comparable performance to state-of-the-art ensemble models.

Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019

TLDR
An overview of the first international evaluation on sound event localization and detection, organized as a task of the DCASE 2019 Challenge, presents in detail how the systems were evaluated and ranked and the characteristics of the best-performing systems.

Accdoa: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization And Detection

TLDR
In experimental evaluations with the DCASE 2020 Task 3 dataset, the ACCDOA representation outperformed the two-branch representation in SELD metrics with a smaller network size and performed better than state-of-the-art SELD systems in terms of localization and location-dependent detection.

A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection

TLDR
To investigate the individual and combined effects of ambient noise, interferers, and reverberation, the performance of the baseline on different versions of the dataset excluding or including combinations of these factors indicates that by far the most detrimental effects are caused by directional interferers.

A General Network Architecture for Sound Event Localization and Detection Using Transfer Learning and Recurrent Neural Network

TLDR
The experimental results using the DCASE 2020 SELD dataset show that the performances of the proposed network architecture using different SED and DOA estimation algorithms and different audio formats are competitive with other state-of-the-art SELD algorithms.

Query-graph with Cross-gating Attention Model for Text-to-Audio Grounding

TLDR
A novel Query Graph with Cross-gating Attention (QGCA) model is proposed, which models the comprehensive relations between the words in query through a novel query graph, and a cross-modal attention module that assigns higher weights to the keywords is introduced to generate the snippet-specific query representations.

SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone Arrays

TLDR
Experimental results on the TAU-NIGENS Spatial Sound Events 2021 dataset showed that the SALSA-Lite feature achieved competitive performance compared to the full SALSA feature, and significantly outperformed the traditional feature set of multichannel log-mel spectrograms with generalized cross-correlation spectra.

Multi-ACCDOA: Localizing And Detecting Overlapping Sounds From The Same Class With Auxiliary Duplicating Permutation Invariant Training

TLDR
In evaluations with the DCASE 2021 Task 3 dataset, the model trained with the multi-ACCDOA format and with the class-wise ADPIT detects overlapping events from the same class while maintaining its performance in the other cases.

Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection

TLDR
An impulse response simulation framework (IRS) that augments spatial characteristics using simulated room impulse responses (RIR) and an ablation study to discuss the contribution and need for each component within the IRS.

References

SHOWING 1-10 OF 25 REFERENCES

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

TLDR
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.

Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

TLDR
Experimental results show that the proposed two-stage polyphonic sound event detection and localization method is able to improve the performance of both SED and DOAE, and also performs significantly better than the baseline method.

TWO-STAGE SOUND EVENT LOCALIZATION AND DETECTION USING INTENSITY VECTOR AND GENERALIZED CROSS-CORRELATION Technical Report

TLDR
A two-stage polyphonic sound event detection and localization method that is able to localize and detect overlapping sound events in different environments, and can improve the performance of both SED and DOA estimation, and performs significantly better than the baseline method.

A multi-room reverberant dataset for sound event localization and detection

TLDR
This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge to detect the temporal activities of a known set of sound event classes, and further localize them in space when active.

Sound source detection, localization and classification using consecutive ensemble of CRNN models

TLDR
This paper uses four CRNN SELDnet-like single output models which run in a consecutive manner to recover all possible information of occurring events to decompose the SELD task into estimating number of active sources, estimating direction of arrival of a single source, estimating destination of the second source where the direction of the first one is known and a multi-label classification task.

Recurrent neural networks for polyphonic sound event detection in real life recordings

In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single

Sound event detection using spatial features and convolutional recurrent neural network

TLDR
This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection and shows that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multich channel audio better when they are presented as separate layers of a volume.

Metrics for Polyphonic Sound Event Detection

This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources

Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network

TLDR
The results show that the proposed DOAnet is capable of estimating the number of sources and their respective DOAs with good precision and generate SPS with high signal-to-noise ratio.

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

TLDR
This work combines these two approaches in a convolutional recurrent neural network (CRNN) and applies it on a polyphonic sound event detection task and observes a considerable improvement for four different datasets consisting of everyday sound events.