Learn More
Monaural source separation is useful for many real-world applications though it is a challenging problem. In this paper, we study deep learning for monaural speech separation. We propose the joint optimization of the deep learning models (deep neural networks and recurrent neural networks) with an extra masking layer, which enforces a reconstruction(More)
Monaural source separation is important for many real world applications. It is challenging since only single channel information is available. In this paper, we explore using deep recurrent neural networks for singing voice separation from monaural recordings in a supervised setting. Deep recurrent neural networks with different temporal connections are(More)
We propose a novel extension of Nonnegative Matrix Factorization (NMF) that models a signal with multiple local dictionaries activated sparsely. This set of local dictionaries for a source, e.g., speech, disjointly constitute a superset that is more discriminative than an ordinary NMF dictionary, because its local structures represent the source's manifold(More)
We address a problem of separating drum sources from monaural mixtures of polyphonic music containing various pitched instruments as well as drums. We consider a spectrogram of music, described by a matrix where each row is associated with intensities of a frequency over time. We employ a joint decomposition to several spectrogram matrices that include two(More)
We address a problem of separating drums from polyphonic music containing various pitched instruments as well as drums. Nonnegative matrix factorization (NMF) was successfully applied to spectrograms of music to learn basis vectors, followed by support vector machine (SVM) to classify basis vectors into ones associated with drums (rhythmic source) only and(More)
In this paper we present some experiments using a deep learning model for speech denoising. We propose a very lightweight procedure that can predict clean speech spectra when presented with noisy speech inputs, and we show how various parameter choices impact the quality of the denoised signal. Through our experiments we conclude that such a structure can(More)
Based on the assumption that there exists a neu-ral network that efficiently represents a set of Boolean functions between all binary inputs and outputs, we propose a process for developing and deploying neural networks whose weight parameters , bias terms, input, and intermediate hidden layer output signals, are all binary-valued, and require only basic(More)
An unsupervised method is proposed aiming at extracting rhythmic sources from commercial polyphonic music whose number of channels is limited to one. Commercial music signals are not usually provided with more than two channels while they often contain multiple instruments including singing voice. Therefore, instead of using conventional ways, such as(More)
We present two complementary topic models to address the analysis of mixture data lying on manifolds. First, we propose a quantization method with an additional mid-layer latent variable, which selects only data points that best preserve the manifold structure of the input data. In order to address the case of modeling all the in-between parts of that(More)
In this paper we present a method for polyphonic music source separation from their monaural mixture, where the underlying assumption is that the harmonic structure of a musical instrument remains roughly the same even if it is played at various pitches and is recorded in various mixing environments. We incorporate with nonneg-ativity, shift-invariance, and(More)