• Publications
  • Influence
Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis
TLDR
The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
TLDR
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
CRNN-Based Multiple DoA Estimation Using Acoustic Intensity Features for Ambisonics Recordings
TLDR
This work proposes to use a neural network built from stacked convolutional and recurrent layers in order to estimate the directions of arrival of multiple sources from a first-order Ambisonics recording, using features derived from the acoustic intensity vector as inputs.
Low-rank Approximation Based Multichannel Wiener Filter Algorithms for Noise Reduction with Application in Cochlear Implants
TLDR
This paper presents low-rank approximation based multichannel Wiener filter algorithms for noise reduction in speech plus noise scenarios, with application in cochlear implants and introduces a more robust rank-1, or more generally rank-R, approximation of the autocorrelation matrix of the speech signal.
CRNN-based Joint Azimuth and Elevation Localization with the Ambisonics Intensity Vector
TLDR
A source localization system for first-order Ambisonics (FOA) contents based on a stacked convolutional and recurrent neural network (CRNN) using the FOA acoustic intensity vector, which is easy to compute and closely linked to the sound direction of arrival (DoA).
Vocal tract length normalisation approaches to DNN-based children's and adults' speech recognition
TLDR
Both VTLN-based approaches are shown to improve phone error rate performance, up to 20% relative improvement, compared to a baseline trained on a mixture of children's and adults' speech.
Feature Learning With Matrix Factorization Applied to Acoustic Scene Classification
TLDR
It is shown that the unsupervised learning methods provide better representations of acoustic scenes than the best conventional hand-crafted features on both datasets and the introduction of a novel nonnegative supervised matrix factorization model and deep neural networks trained on spectrograms allow for further improvements.
Acoustic Features for Environmental Sound Analysis
TLDR
The general processing chain to convert an sound signal to a feature vector that can be efficiently exploited by a classifier and the relation to features used for speech and music processing are described is this chapter.
Sound Event Detection in Synthetic Domestic Environments
TLDR
A comparative analysis of the performance of state-of-the-art sound event detection systems based on the results of task 4 of the DCASE 2019 challenge, where submitted systems were evaluated on a series of synthetic soundscapes that allow us to carefully control for different soundscape characteristics.
...
...