Ngoc Q. K. Duong

Learn More
This paper addresses the modeling of reverberant recording environments in the context of under-determined convolutive blind source separation. We model the contribution of each source to all mixture channels in the time-frequency domain as a zero-mean Gaussian random variable whose covariance encodes the spatial characteristics of the source. We then(More)
This paper provides an overview of the Predicting Media Interestingness task that is organized as part of the MediaEval 2016 Benchmarking Initiative for Multimedia Evaluation. The task, which is running for the first year, expects participants to create systems that automatically select images and video segments that are considered to be the most(More)
We present the outcomes of three recent evaluation campaigns in the field of audio and biomedical source separation. These campaigns have witnessed a boom in the range of applications of source separation systems in the last few years, as shown by the increasing number of datasets from 1 to 9 and the increasing number of submissions from 15 to 34. We first(More)
We consider a single-channel source separation problem consisting in separating speech from nonstationary background such as music. We introduce a novel approach called text-informed separation, where the source separation process is guided by the corresponding textual information. First, given the text, we propose to produce a speech example via either a(More)
This paper introduces the audio part of the 2010 communitybased Signal Separation Evaluation Campaign (SiSEC2010). Seven speech and music datasets were contributed, which include datasets recorded in noisy or dynamic environments, in addition to the SiSEC2008 datasets. The source separation problems were split into five tasks, and the results for each task(More)
We consider the local Gaussian modeling framework for under-determined convolutive audio source separation, where the spatial image of each source is modeled as a zero-mean Gaussian variable with full-rank timeand frequencydependent covariance. We investigate two methods to improve the accuracy of parameter estimation, based on the use of local observed(More)
For the implementation of emerging second screen TV applications, there is a need for a technique to assure fast and accurate synchronization of media components streamed over different networks to different rendering devices. One approach of great value is to exploit the unmodified audio stream of the original media, and compare it to a reference version.(More)
This paper considers the blind separation of the harmonic and percussive components of multichannel music signals. We model the contribution of each source to all mixture channels in the time-frequency domain via a spatial covariance matrix, which encodes its spatial characteristics, and a scalar spectral variance, which represents its spectral structure.(More)
This paper addresses a challenging single-channel speech enhancement problem in real-world environment where speech signal is corrupted by high level background noise. While most state-of-the-art algorithms tries to estimate noise spectral power and filter it from the observed one to obtain enhanced speech, the paper discloses another approach inspired from(More)
We address the problem of blind audio source separation in the under-determined and convolutive case. The contribution of each source to the mixture channels in the time-frequency domain is modeled by a zero-mean Gaussian random vector with a full rank covariance matrix composed of two terms: a variance which represents the spectral properties of the source(More)