• Publications
  • Influence
Deep clustering: Discriminative embeddings for segmentation and separation
TLDR
Preliminary experiments on single-channel mixtures from multiple speakers show that a speaker-independent model trained on two-speaker mixtures can improve signal quality for mixtures of held-out speakers by an average of 6dB, and the same model does surprisingly well with three-speakers mixtures.
Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error
TLDR
This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary speech recognition tasks.
SDR – Half-baked or Well Done?
TLDR
It is argued here that the signal-to-distortion ratio (SDR) implemented in the BSS_eval toolkit has generally been improperly used and abused, especially in the case of single-channel separation, resulting in misleading results.
Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks
TLDR
A phase-sensitive objective function based on the signal-to-noise ratio (SNR) of the reconstructed signal is developed, and it is shown that in experiments it yields uniformly better results in terms of signal- to-distortion ratio (SDR).
Single-Channel Multi-Speaker Separation Using Deep Clustering
TLDR
This paper significantly improves upon the baseline system performance by incorporating better regularization, larger temporal context, and a deeper architecture, culminating in an overall improvement in signal to distortion ratio (SDR) of 10.3 dB compared to the baseline, and produces unprecedented performance on a challenging speech separation.
Full-Capacity Unitary Recurrent Neural Networks
TLDR
This work provides a theoretical argument to determine if a unitary parameterization has restricted capacity, and shows how a complete, full-capacity unitary recurrence matrix can be optimized over the differentiable manifold of unitary matrices.
Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR
TLDR
It is demonstrated that LSTM speech enhancement, even when used 'naively' as front-end processing, delivers competitive results on the CHiME-2 speech recognition task.
Discriminatively trained recurrent neural networks for single-channel speech separation
TLDR
The results confirm the importance of fine-tuning the feature representation for DNN training and show consistent improvements by discriminative training, whereas long short-term memory recurrent DNNs obtain the overall best results.
Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation
TLDR
It is found that simply encoding inter-microphone phase patterns as additional input features during deep clustering provides a significant improvement in separation performance, even with random microphone array geometry.
Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks
TLDR
It is shown that using a single mask across microphones for covariance prediction with minima-limited post-masking yields the best result in terms of signal-level quality measures and speech recognition word error rates in a mismatched training condition.
...
1
2
3
4
5
...