Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement
@inproceedings{Xu2015MultiobjectiveLA, title={Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement}, author={Yong Xu and Jun Du and Zhen Huang and Lirong Dai and Chin-Hui Lee}, booktitle={INTERSPEECH}, year={2015} }
We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals. In deep neural network (DNN) based SE we introduce an auxiliary structure to learn secondary continuous features, such as mel-frequency cepstral coefficients (MFCCs), and categorical information, such as the ideal binary…
86 Citations
Robust Speech Recognition based on Multi-Objective Learning with GRU Network
- Computer Science2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- 2019
The experimental results show that the new multi-objective scheme with joint feature mapping and the posterior probability learning method improves the performance of SE and significantly lowers the Character Error Rate of the AM compared to the baseline deep neural network (DNN) network.
SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement
- Computer ScienceINTERSPEECH
- 2016
Experimental results demonstrate that SNRbased progressive learning can effectively improve perceptual evaluation of speech quality and short-time objective intelligibility in low SNR environments, and reduce the model parameters by 50% when compared with the DNN baseline system.
Multiple-target deep learning for LSTM-RNN based speech enhancement
- Computer Science2017 Hands-free Speech Communications and Microphone Arrays (HSCMA)
- 2017
The proposed framework can consistently and significantly improve the objective measures for both speech quality and intelligibility and a novel multiple-target joint learning approach is designed to fully utilize this complementarity.
Shared Network for Speech Enhancement Based on Multi-Task Learning
- Computer Science2020 15th International Conference on Computer Science & Education (ICCSE)
- 2020
This work proposes a two-stage based method called ShareNet that first train a convolutional neural network to perform noise reduction, and then stack these two pretrained blocks while keeping the parameters shared to perform both denoising and repairing tasks.
A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2020
Evaluated on the simulation speech data, experimental results in unseen noises cases demonstrate that the proposed SNR-PL approach consistently performs better than the conventional LSTM approach in terms of objective speech enhancement measures for speech intelligibility and quality.
Multi-Metrics Learning for Speech Enhancement
- Computer ScienceArXiv
- 2017
Experimental results show that the proposed method can notably outperform the conventional DNN-based speech enhancement system that enhances the magnitude spectrogram alone and the MML criterion can further improve some objective metrics without trading off other objective metric scores.
A Mask-Based Post Processing Approach for Improving the Quality and Intelligibility of Deep Neural Network Enhanced Speech
- Computer Science2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)
- 2017
Objective tests show that the proposed approach always improves both speech quality and intelligibility, and it outperforms a corresponding baseline system in both matched and mismatched noise conditions.
Densely Connected Progressive Learning for LSTM-Based Speech Enhancement
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
Experimental results demonstrate that the dense structure with deeper LSTM layers can yield significant gains of speech intelligibility measure for all noise types and levels and the post-processing with more targets tends to achieve better performance.
A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2020
A novel parallel-data-free speech enhancement method, in which the cycle-consistent generative adversarial network (CycleGAN) and multi-objective learning are employed, which is effective to improve speech quality and intelligibility when the networks are trained under the parallel data.
Complex spectrogram enhancement by convolutional neural network with multi-metrics learning
- Computer Science2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)
- 2017
A novel convolutional neural network model is proposed for complex spectrogram enhancement, namely estimating clean real and imaginary (RI) spectrograms from noisy ones, and the learning process is called multi-metrics learning (MML).
References
SHOWING 1-10 OF 44 REFERENCES
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2015
The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.
On Training Targets for Supervised Speech Separation
- Computer Science, PhysicsIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2014
Results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics, and that masking based targets, in general, are significantly better than spectral envelope based targets.
An Experimental Study on Speech Enhancement Based on Deep Neural Networks
- Computer ScienceIEEE Signal Processing Letters
- 2014
This letter presents a regression-based speech enhancement framework using deep neural networks (DNNs) with a multiple-layer deep architecture that tends to achieve significant improvements in terms of various objective quality measures.
Multi-task learning in deep neural networks for improved phoneme recognition
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
It is demonstrated that, even on a strong baseline, multi-task learning can provide a significant decrease in error rate, and this paper explores three natural choices for the secondary task: the phone label, the phone context, and the state context.
Dynamic noise aware training for speech enhancement based on deep neural networks
- Computer ScienceINTERSPEECH
- 2014
Three algorithms to address the mismatch problem in deep neural network (DNN) based speech enhancement are proposed and can well suppress highly non-stationary noise better than all the competing state-of-the-art techniques.
Speech enhancement based on deep denoising autoencoder
- Computer ScienceINTERSPEECH
- 2013
Experimental results show that adding depth of the DAE consistently increase the performance when a large training data set is given, and compared with a minimum mean square error based speech enhancement algorithm, the proposed denoising DAE provided superior performance on the three objective evaluations.
Denoising deep neural networks based voice activity detection
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
Experimental results show that the proposed denoising-deep-neural-network (DDNN) based VAD not only outperforms the DBN-based VAD but also shows an apparent performance improvement of the deep layers over shallower layers.
An investigation of deep neural networks for noise robust speech recognition
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
The noise robustness of DNN-based acoustic models can match state-of-the-art performance on the Aurora 4 task without any explicit noise compensation and can be further improved by incorporating information about the environment into DNN training using a new method called noise-aware training.
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2012
A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2013
This paper proposes a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF), and compares the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures.