Paris Smaragdis

Learn More
In this paper we present a methodology for analyzing polyphonic musical passages comprised by notes that exhibit a harmonically fixed spectral profile (such as piano notes). Taking advantage of this unique note structure we can model the audio content of the musical passage by a linear basis transform and use non-negative matrix decomposition methods to(More)
Monaural source separation is useful for many real-world applications though it is a challenging problem. In this paper, we study deep learning for monaural speech separation. We propose the joint optimization of the deep learning models (deep neural networks and recurrent neural networks) with an extra masking layer, which enforces a reconstruction(More)
Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical(More)
Separating singing voices from music accompaniment is an important task in many applications, such as music information retrieval, lyric recognition and alignment. Music accompaniment can be assumed to be in a low-rank subspace, because of its repetition structure; on the other hand, singing voices can be regarded as relatively sparse within songs. In this(More)
In this paper, we present a convolutive basis decomposition method and its application on simultaneous speakers separation from monophonic recordings. The model we propose is a convolutive version of the nonnegative matrix factorization algorithm. Due to the nonnegativity constraint this type of coding is very well suited for intuitively and efficiently(More)
Monaural source separation is important for many real world applications. It is challenging because, with only a single channel of information available, without any constraints, an infinite number of solutions are possible. In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation(More)
This paper presents a family of probabilistic latent variable models that can be used for analysis of nonnegative data. We show that there are strong ties between nonnegative matrix factorization and this family, and provide some straightforward extensions which can help in dealing with shift invariances, higher-order decompositions and sparsity(More)
In this paper we present an extension to the Non-Negative Matrix Factorization algorithm which is capable of identifying components with temporal structure. We demonstrate the use of this algorithm in the magnitude spectrum domain, where we employ it to perform extraction of multiple sound objects from a single channel auditory scene. International Congress(More)
In this paper we describe a methodology for model-based single channel separation of sounds. We present a sparse latent variable model that can learn sounds based on their distribution of time/frequency energy. This model can then be used to extract known types of sounds from mixtures in two scenarios. One being the case where all sound types in the mixture(More)