• Publications
  • Influence
Performance measurement in blind audio source separation
TLDR
This paper considers four different sets of allowed distortions in blind audio source separation algorithms, from time-invariant gains to time-varying filters, and derives a global performance measure using an energy ratio, plus a separate performance measure for each error term.
The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines
TLDR
The design and outcomes of the 3rd CHiME Challenge, which targets the performance of automatic speech recognition in a real-world, commercially-motivated scenario: a person talking to a tablet device that has been fitted with a six-channel microphone array, are presented.
DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System
TLDR
This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset.
Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model
This paper addresses the modeling of reverberant recording environments in the context of under-determined convolutive blind source separation. We model the contribution of each source to all mixture
Subjective and Objective Quality Assessment of Audio Source Separation
TLDR
A family of objective measures aiming to predict subjective scores based on the decomposition of the estimation error into several distortion components and on the use of the PEMO-Q perceptual salience measure to provide multiple features that are then combined are proposed.
The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings
TLDR
DEMAND (Diverse Environments Multi-channel Acoustic Noise Database) is provided, providing a set of 16-channel noise files recorded in a variety of indoor and outdoor settings to encourage research into algorithms beyond the stereo setup.
Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR
TLDR
It is demonstrated that LSTM speech enhancement, even when used 'naively' as front-end processing, delivers competitive results on the CHiME-2 speech recognition task.
A General Flexible Framework for the Handling of Prior Information in Audio Source Separation
TLDR
This paper introduces a general audio source separation framework based on a library of structured source models that enable the incorporation of prior knowledge about each source via user-specifiable constraints.
Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation
TLDR
A NMF-like algorithm is derived that performs similarly to supervised NMF using pre-trained piano spectra but improves pitch estimation performance by 6% to 10% compared to alternative unsupervised NMF algorithms.
...
...