Hakan Erdogan

Learn More
Separation of speech embedded in non-stationary interference is a challenging problem that has recently seen dramatic improvements using deep network-based methods. Previous work has shown that estimating a masking function to be applied to the noisy spectrum is a viable approach that can be improved by using a signal-approximation based objective function.(More)
Compressed sensing is a developing field aiming at reconstruction of sparse signals acquired in reduced dimensions, which make the recovery process under-determined. The required solution is the one with minimum l0 norm due to sparsity, however it is not practical to solve the l0 minimization problem. Commonly used techniques include l1 minimization, such(More)
We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech(More)
In this paper, a novel approach for single channel source separation (SCSS) using a deep neural network (DNN) architecture is introduced. Unlike previous studies in which DNN and other classifiers were used for classifying time-frequency bins to obtain hard masks for each source, we use the DNN to classify estimated source spectra to check for their(More)
In this work, we introduce a new discriminative training method for nonnegative dictionary learning. The new method can be used in single channel source separation (SCSS) applications. In SCSS, nonnegative matrix factorization (NMF) is used to learn a dictionary (a set of basis vectors) for each source in the magnitude spectrum domain. The trained(More)
Linear Discriminant Analysis (LDA) aims to transform an original feature space to a lower dimensional space with as little loss in discrimination as possible. We introduce a novel LDA matrix computation that incorporates confusability information between classes into the transform. Our goal is to improve discrimination in LDA. In conventional LDA, a between(More)
Recovery of sparse signals from compressed measurements constitutes an l0 norm minimization problem, which is unpractical to solve. A number of sparse recovery approaches have appeared in the literature, including l1 minimization techniques, greedy pursuit algorithms, Bayesian methods and nonconvex optimization techniques among others. This manuscript(More)
A single channel speech-music separation algorithm based on nonnegative matrix factorization (NMF) with spectral masks is proposed in this work. The proposed algorithm uses training data of speech and music signals with nonnegative matrix factorization followed by masking to separate the mixed signal. In the training stage, NMF uses the training data to(More)
In this paper, we present a method for incremental on-line adaptation based on feature space Maximum Likelihood Linear Regression (FMLLR) for telephony speech recognition applications. We explain how to incorporate a feature space MLLR transform into a stack decoder and perform on-line adaptation. The issues discussed are as follows: collecting adaptation(More)
Automatic name dialing is a practical and interesting application of speech recognition on telephony systems. The IBM name recognition system is a large vocabulary, speaker independent system currently in use for reaching IBM employees in the United States. In this paper, we present some innovative algorithms that improve name recognition accuracy. Unlike(More)