A Regression Approach to Speech Enhancement Based on Deep Neural Networks

@article{Xu2015ARA,
  title={A Regression Approach to Speech Enhancement Based on Deep Neural Networks},
  author={Yanchen Xu and Jun Du and Lirong Dai and Chin-Hui Lee},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year={2015},
  volume={23},
  pages={7-19}
}
In contrast to the conventional minimum mean square error (MMSE)-based noise reduction techniques, we propose a supervised method to enhance speech by means of finding a mapping function between noisy and clean speech signals based on deep neural networks (DNNs). [...] Key Method In order to be able to handle a wide range of additive noises in real-world situations, a large training set that encompasses many possible combinations of speech and noise types, is first designed.Expand
Normalized Features for Improving the Generalization of DNN Based Speech Enhancement
TLDR
This work employs the a priori signal-to-noise ratio (SNR) and the a posteriori SNR estimated as input features in a deep neural network (DNN) based enhancement scheme and shows that this approach allows ML based speech estimators to generalize quickly to unknown noise types even if only few noise conditions have been seen during training. Expand
Improving Statistical Model-Based Speech Enhancement with Deep Neural Networks
TLDR
A DNN is trained to predict speech presence in the input signal, and this information is leveraged to design novel methods for noise tracking and a priori signal-to-noise ratio estimation, which remain the most challenging tasks in conventional systems. Expand
A Maximum Likelihood Approach to Masking-based Speech Enhancement Using Deep Neural Network
TLDR
Experimental results on the TIMIT corpus show the proposed ML-based learning approach can achieve consistent improvements over MMSE-based DNN learning on all evaluation metrics. Expand
Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments
TLDR
A joint framework combining speech enhancement SE and voice activity detection VAD to increase the speech intelligibility in low signal-noise-ratio SNR environments and demonstrates that the proposed SE approach effectively improves short-time objective intelligibility STOI. Expand
Improving the Generalizability of Deep Neural Network Based Speech Enhancement
TLDR
This work employs the a priori signal-to-noise ratio (SNR) and the a posteriori SNR estimated by non-ML based algorithms as input features in a deep neural network (DNN) based enhancement scheme and shows that this approach allows ML based speech estimators to generalize quickly to unknown noise types even if only few noise conditions have been seen during training. Expand
A Noise Prediction and Time-Domain Subtraction Approach to Deep Neural Network Based Speech Enhancement
Deep neural networks (DNNs) have recently been successfully applied to the speech enhancement task; however, the low signal-to-noise ratio (SNR) performance of DNN-based speech enhancement systemsExpand
Speech enhancement based on noise classification and deep neural network
TLDR
Deep neural network has recently been successfully adopted as a regression model in speech enhancement and this work presents a new DNN model that can be used as a model for speech enhancement. Expand
Perceptually Guided Speech Enhancement Using Deep Neural Networks
TLDR
This paper proposes a new deep neural networks based enhancement approach by incorporating a speech perception model into the loss function, and uses the short-time objective intelligibility metric in the loss in addition to the mean squared error. Expand
Multi-objective noisy-based deep feature loss for speech enhancement
  • Rafal Pilarczyk, W. Skarbek
  • Computer Science, Engineering
  • Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA)
  • 2019
TLDR
This work shows that the use of only deep features in the loss function allows a significant improvement in the measurement of speech signal quality, and believes that deep-feature loss could help in the optimization of functions difficult to differentiate. Expand
Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition
TLDR
The proposed reinforcement learning (RL) algorithm to optimize the SE model based on the recognition results can effectively improve the ASR results with a notable 12:40% and 19:23% error rate reductions for signal to noise ratio (SNR) at 0 dB and 5 dB conditions, respectively. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 60 REFERENCES
An Experimental Study on Speech Enhancement Based on Deep Neural Networks
TLDR
This letter presents a regression-based speech enhancement framework using deep neural networks (DNNs) with a multiple-layer deep architecture that tends to achieve significant improvements in terms of various objective quality measures. Expand
An investigation of deep neural networks for noise robust speech recognition
TLDR
The noise robustness of DNN-based acoustic models can match state-of-the-art performance on the Aurora 4 task without any explicit noise compensation and can be further improved by incorporating information about the environment into DNN training using a new method called noise-aware training. Expand
Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition
TLDR
An in-depth evaluation of such techniques as a front-end for noise-robust automatic speech recognition (ASR) and a diagonal feature discriminant linear regression (dFDLR) adaptation that can be performed on a per-utterance basis for ASR systems employing deep neural networks and HMM are performed. Expand
Speech enhancement with weighted denoising auto-encoder
TLDR
A novel speech enhancement method with Weighted Denoising Auto-encoder (WDA) is proposed, which could achieve similar amount of noise reduction in both white and colored noise, and the distortion on the level of speech signal is smaller. Expand
Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly non-stationary noise
TLDR
It is shown that BLSTM networks are well-suited for mapping from noisy to clean speech features and that the obtained recognition performance gain is partly complementary to improvements via additional techniques such as speech enhancement by non-negative matrix factorization and probabilistic feature generation by Bottleneck-BLSTM Networks. Expand
Ideal ratio mask estimation using deep neural networks for robust speech recognition
  • A. Narayanan, Deliang Wang
  • Computer Science
  • 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
TLDR
The proposed feature enhancement algorithm estimates a smoothed ideal ratio mask (IRM) in the Mel frequency domain using deep neural networks and a set of time-frequency unit level features that has previously been used to estimate the ideal binary mask. Expand
Recurrent Neural Networks for Noise Reduction in Robust ASR
TLDR
This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly. Expand
On Training Targets for Supervised Speech Separation
TLDR
Results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics, and that masking based targets, in general, are significantly better than spectral envelope based targets. Expand
Joint noise adaptive training for robust automatic speech recognition
  • A. Narayanan, Deliang Wang
  • Computer Science
  • 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
TLDR
By formulating separation as a supervised mask estimation problem, a unified DNN framework is developed that jointly improves separation and acoustic modeling and improves performance on the Aurora-4 dataset. Expand
Towards Scaling Up Classification-Based Speech Separation
TLDR
This work proposes to learn more linearly separable and discriminative features from raw acoustic features and train linear SVMs, which are much easier and faster to train than kernel SVMs. Expand
...
1
2
3
4
5
...