An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

@article{Cao2021AnIE,
  title={An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection},
  author={Yin Cao and Turab Iqbal and Qiuqiang Kong and Yue Zhong and Wenwu Wang and Mark D. Plumbley},
  journal={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2021},
  pages={885-889}
}
  • Yin Cao, Turab Iqbal, Mark D. Plumbley
  • Published 30 September 2020
  • Computer Science
  • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Polyphonic sound event localization and detection (SELD), which jointly performs sound event detection (SED) and direction-of-arrival (DoA) estimation, detects the type and occurrence time of sound events as well as their corresponding DoA angles simultaneously. We study the SELD task from a multi-task learning perspective. Two open problems are addressed in this paper. Firstly, to detect overlapping sound events of the same type but with different DoAs, we propose to use a trackwise output… 

Figures and Tables from this paper

A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection
TLDR
A trackwise ensemble event independent network with a novel data augmentation method based on the previous proposed Event-Independent Network V2 and extended by conformer blocks and dense blocks is proposed to solve an ensemble model problem for track-wise output format that track permutation may occur among different models.
SOUND EVENT LOCALIZATION AND DETECTION USING CROSS-MODAL ATTENTION AND PARAMETER SHARING FOR DCASE2021 CHALLENGE
TLDR
Experimental results showed that the model for DCASE2021 Challenge Task3: Sound Event Localization and Detection (SELD) with Directional Interference provided significantly improved performance than the baseline method.
A General Network Architecture for Sound Event Localization and Detection Using Transfer Learning and Recurrent Neural Network
TLDR
The experimental results using the DCASE 2020 SELD dataset show that the performances of the proposed network architecture using different SED and DOA estimation algorithms and different audio formats are competitive with other state-of-the-art SELD algorithms.
A Method Based on Dual Cross-Modal Attention and Parameter Sharing for Polyphonic Sound Event Localization and Detection
TLDR
Experimental results demonstrate that the efficient model using one common decoder block based on the DCMA to predict multiple events in the track-wise output format is effective for the SELD task with up to three overlapping events.
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
TLDR
In evaluations with the DCASE 2021 Task 3 dataset, the model trained with the multi-ACCDOA format and with the class-wise ADPIT detects overlapping events from the same class while maintaining its performance in the other cases.
SELF-ATTENTION MECHANISM FOR SOUND EVENT LOCALIZATION AND DETECTION Technical Report
TLDR
This technical report describes the system submitted to DCASE 2021 Task 3: Sound Event Localization and Detection (SELD) with Directional Interference and proposes an architecture called Many-to-Many Audio Spectrogram Transformer (M2MAST) that uses a pure Transformer to reduce the dependency of CNNs and easily change the output resolution.
Assessment of Self-Attention on Learned Features For Sound Event Localization and Detection
TLDR
The effect of MHSA on the SELD task is studied in detail, including the effects of replacing the RNN blocks with self-attention layers, and the effect of position embeddings and layer normalization.
SoundDet: Polyphonic Sound Event Detection and Localization from Raw Waveform
TLDR
A new framework SoundDet is presented, which is an end-to-end trainable and light-weight framework, for polyphonic moving sound event detection and localization, which consists of a backbone neural network and two parallel heads for temporal detection and spatial localization.
SoundDet: Polyphonic Moving Sound Event Detection and Localization from Raw Waveform
TLDR
A new framework SoundDet is presented, which is an end-to-end trainable and light-weight framework, for polyphonic moving sound event detection and localization, which consists of a backbone neural network and two parallel heads for temporal detection and spatial localization.
What Makes Sound Event Localization and Detection Difficult? Insights from Error Analysis
TLDR
Experimental results indicate polyphony as the main challenge in SELD, due to the difference inulty in detecting all sound events of interest, and the SELD systems tend to make fewer errors for the polyphonic scenario that is dominant in the training set.
...
...

References

SHOWING 1-10 OF 74 REFERENCES
A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
TLDR
A two-step approach that decouples the learning of the sound event detection and directional-of-arrival estimation systems is proposed, which allows the flexibility in the system design, and increases the performance of the whole sound event localization and detection system.
Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy
TLDR
Experimental results show that the proposed two-stage polyphonic sound event detection and localization method is able to improve the performance of both SED and DOAE, and also performs significantly better than the baseline method.
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
TLDR
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net
TLDR
Two systems that solve sound event localization and sound event detection simultaneously and a two-stage system that first handles the SED and SEL tasks individually and later combines those results are considered.
TWO-STAGE SOUND EVENT LOCALIZATION AND DETECTION USING INTENSITY VECTOR AND GENERALIZED CROSS-CORRELATION Technical Report
TLDR
A two-stage polyphonic sound event detection and localization method that is able to localize and detect overlapping sound events in different environments, and can improve the performance of both SED and DOA estimation, and performs significantly better than the baseline method.
A multi-room reverberant dataset for sound event localization and detection
TLDR
This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge to detect the temporal activities of a known set of sound event classes, and further localize them in space when active.
Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019
TLDR
An overview of the first international evaluation on sound event localization and detection, organized as a task of the DCASE 2019 Challenge, presents in detail how the systems were evaluated and ranked and the characteristics of the best-performing systems.
A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection
TLDR
This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge, and an updated version of the one used in the previous challenge, with input features and training modifications to improve its performance.
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
Guided Learning Convolution System for DCASE 2019 Task 4
TLDR
The system submitted to DCASE2019 task 4: sound event detection (SED) in domestic environments with a convolutional neural network with an embedding-level attention pooling module achieves the best performance compared to those of other participates.
...
...