• Publications
  • Influence
Convolutive Speech Bases and Their Application to Supervised Speech Separation
  • P. Smaragdis
  • Computer Science
    IEEE Transactions on Audio, Speech, and Language…
  • 2007
The model proposed is a convolutive version of the nonnegative matrix factorization algorithm, which is very well suited for intuitively and efficiently representing magnitude spectra and its application on simultaneous speakers separation from monophonic recordings is presented.
Non-negative matrix factorization for polyphonic music transcription
This work presents a methodology for analyzing polyphonic musical passages comprised of notes that exhibit a harmonically fixed spectral profile (such as piano notes), which results in a very simple and compact system that is not knowledge-based, but rather learns notes by observation.
Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs
In this paper we present an extension to the Non-Negative Matrix Factorization algorithm which is capable of identifying components with temporal structure. We demonstrate the use of this algorithm
Singing-voice separation from monaural recordings using robust principal component analysis
This paper proposes using robust principal component analysis for singing-voice separation from music accompaniment using a binary time-frequency masking method and shows that this method can achieve around 1~1.4 dB higher GNSDR compared with two state-of-the-art approaches without using prior training or requiring particular features.
Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation
Joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including speech separation, singing voice separation, and speech denoising, and a discriminative criterion for training neural networks to further enhance the separation performance are explored.
Deep learning for monaural speech separation
The joint optimization of the deep learning models (deep neural networks and recurrent neural networks) with an extra masking layer, which enforces a reconstruction constraint, is proposed to enhance the separation performance of monaural speech separation models.
Speech denoising using nonnegative matrix factorization with priors
A technique for denoising speech using nonnegative matrix factorization (NMF) in combination with statistical speech and noise models is presented and improvements in speech quality across a range of interfering noise types are shown.
Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures
A sparse latent variable model that can learn sounds based on their distribution of time/ frequency energy is presented that can be used to extract known types of sounds from mixtures in two scenarios.
Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization
This paper proposes a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF), and compares the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures.