• Publications
  • Influence
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
TLDR
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network
TLDR
The results show that the proposed DOAnet is capable of estimating the number of sources and their respective DOAs with good precision and generate SPS with high signal-to-noise ratio.
Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features
TLDR
The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED.
Sound event detection using spatial features and convolutional recurrent neural network
TLDR
This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection and shows that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multich channel audio better when they are presented as separate layers of a volume.
A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection
TLDR
This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge, and an updated version of the one used in the previous challenge, with input features and training modifications to improve its performance.
A report on sound event detection with different binaural features
TLDR
Three different binaural features are studied and evaluated on the publicly available TUT Sound Events 2017 dataset and seen to consistently perform equal to or better than the single-channel features with respect to error rate metric.
Automated audio captioning with recurrent neural networks
TLDR
Results from metrics show that the proposed method can predict words appearing in the original caption, but not always correctly ordered.
A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection
TLDR
To investigate the individual and combined effects of ambient noise, interferers, and reverberation, the performance of the baseline on different versions of the dataset excluding or including combinations of these factors indicates that by far the most detrimental effects are caused by directional interferers.
A multi-room reverberant dataset for sound event localization and detection
TLDR
This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge to detect the temporal activities of a known set of sound event classes, and further localize them in space when active.
Joint Measurement of Localization and Detection of Sound Events
TLDR
This paper proposes augmentation of the localization metrics with a condition related to the detection, and conversely, use of location information in calculating the true positives for detection.
...
...