Learn More
A new approach for convolutive blind source separation (BSS) by explicitly exploiting the second-order nonstationarity of signals and operating in the frequency domain is proposed. The algorithm accommodates a penalty function within the cross-power spectrum-based cost function and thereby converts the separation problem into a joint diagonalization problem(More)
In this paper we investigate the problem of integrating the complementary audio and visual modalities for speech separation. Rather than using independence criteria suggested in most blind source separation (BSS) systems, we use the visual feature from a video signal as additional information to optimize the unmixing matrix. We achieve this by using a(More)
We consider the data-driven dictionary learning problem. The goal is to seek an over-complete dictionary from which every training signal can be best approximated by a linear combination of only a few codewords. This task is often achieved by iteratively executing two operations: sparse coding and dictionary update. The focus of this paper is on the(More)
Machine audition is the field of the study of algorithms and systems for the automatic analysis and understanding of sound by machine. It plays an important role in many applications, such as automatic audio indexing for internet searching, robust speech recognition in un-controlled natural environment, untethered audio communication within an intelligent(More)
  • Wenwu Wang
  • 2008 IEEE International Joint Conference on…
  • 2008
Non-negative sparse coding (NSC) is a powerful technique for low-rank data approximation, and has found several successful applications in signal processing. However, the temporal dependency, which is a vital clue for many realistic signals, has not been taken into account in its conventional model. In this paper, we propose a general framework, i.e.,(More)
Using the convolutive nonnegative matrix factorization (NMF) model due to Smaragdis, we develop a novel algorithm for matrix decomposition based on the squared Euclidean distance criterion. The algorithm features new formally derived learning rules and an efficient update for the reconstructed nonnegative matrix. Performance comparisons in terms of(More)
A sequential algorithm for the blind separation of a class of periodic source signals is introduced in this paper. The algorithm is based only on second-order statistical information and exploits the assumption that the source signals have distinct periods. Separation is performed by sequentially converging to a solution which in effect diagonalizes the(More)
We explore the permutation problem of frequency domain blind source separation (BSS). Based on performance analysis of three approaches: exploiting spectral continuity, exploiting time envelope structure and beamforming alignment; we present a new hybrid method which incorporates a psychoacoustic filtering process for the misaligned permutations unable to(More)
The problem of blind source separation (BSS) is investigated. Following the assumption that the time-frequency (TF) distributions of the input sources do not overlap, quadratic TF representation is used to exploit the sparsity of the statistically nonstationary sources. However, separation performance is shown to be limited by the selection of a certain(More)
The DCASE Challenge 2016 contains tasks for Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), and audio tagging. Since 2006, Deep Neural Networks (DNNs) have been widely applied to computer visions, speech recognition and natural language processing tasks. In this paper, we provide DNN baselines for the DCASE Challenge 2016. In Task 1 we(More)