• Corpus ID: 2038308

Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement

@inproceedings{Xu2015MultiobjectiveLA,
  title={Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement},
  author={Yong Xu and Jun Du and Zhen Huang and Lirong Dai and Chin-Hui Lee},
  booktitle={INTERSPEECH},
  year={2015}
}
We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals. In deep neural network (DNN) based SE we introduce an auxiliary structure to learn secondary continuous features, such as mel-frequency cepstral coefficients (MFCCs), and categorical information, such as the ideal binary… 

Figures and Tables from this paper

Robust Speech Recognition based on Multi-Objective Learning with GRU Network
TLDR
The experimental results show that the new multi-objective scheme with joint feature mapping and the posterior probability learning method improves the performance of SE and significantly lowers the Character Error Rate of the AM compared to the baseline deep neural network (DNN) network.
SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement
TLDR
Experimental results demonstrate that SNRbased progressive learning can effectively improve perceptual evaluation of speech quality and short-time objective intelligibility in low SNR environments, and reduce the model parameters by 50% when compared with the DNN baseline system.
Multiple-target deep learning for LSTM-RNN based speech enhancement
TLDR
The proposed framework can consistently and significantly improve the objective measures for both speech quality and intelligibility and a novel multiple-target joint learning approach is designed to fully utilize this complementarity.
Shared Network for Speech Enhancement Based on Multi-Task Learning
TLDR
This work proposes a two-stage based method called ShareNet that first train a convolutional neural network to perform noise reduction, and then stack these two pretrained blocks while keeping the parameters shared to perform both denoising and repairing tasks.
A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement
TLDR
Evaluated on the simulation speech data, experimental results in unseen noises cases demonstrate that the proposed SNR-PL approach consistently performs better than the conventional LSTM approach in terms of objective speech enhancement measures for speech intelligibility and quality.
Multi-Metrics Learning for Speech Enhancement
TLDR
Experimental results show that the proposed method can notably outperform the conventional DNN-based speech enhancement system that enhances the magnitude spectrogram alone and the MML criterion can further improve some objective metrics without trading off other objective metric scores.
A Mask-Based Post Processing Approach for Improving the Quality and Intelligibility of Deep Neural Network Enhanced Speech
TLDR
Objective tests show that the proposed approach always improves both speech quality and intelligibility, and it outperforms a corresponding baseline system in both matched and mismatched noise conditions.
Densely Connected Progressive Learning for LSTM-Based Speech Enhancement
TLDR
Experimental results demonstrate that the dense structure with deeper LSTM layers can yield significant gains of speech intelligibility measure for all noise types and levels and the post-processing with more targets tends to achieve better performance.
A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network
  • Yang Xiang, C. Bao
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2020
TLDR
A novel parallel-data-free speech enhancement method, in which the cycle-consistent generative adversarial network (CycleGAN) and multi-objective learning are employed, which is effective to improve speech quality and intelligibility when the networks are trained under the parallel data.
Complex spectrogram enhancement by convolutional neural network with multi-metrics learning
TLDR
A novel convolutional neural network model is proposed for complex spectrogram enhancement, namely estimating clean real and imaginary (RI) spectrograms from noisy ones, and the learning process is called multi-metrics learning (MML).
...
...

References

SHOWING 1-10 OF 44 REFERENCES
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
TLDR
The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.
On Training Targets for Supervised Speech Separation
TLDR
Results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics, and that masking based targets, in general, are significantly better than spectral envelope based targets.
An Experimental Study on Speech Enhancement Based on Deep Neural Networks
TLDR
This letter presents a regression-based speech enhancement framework using deep neural networks (DNNs) with a multiple-layer deep architecture that tends to achieve significant improvements in terms of various objective quality measures.
Multi-task learning in deep neural networks for improved phoneme recognition
  • M. Seltzer, J. Droppo
  • Computer Science
    2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
TLDR
It is demonstrated that, even on a strong baseline, multi-task learning can provide a significant decrease in error rate, and this paper explores three natural choices for the secondary task: the phone label, the phone context, and the state context.
Dynamic noise aware training for speech enhancement based on deep neural networks
TLDR
Three algorithms to address the mismatch problem in deep neural network (DNN) based speech enhancement are proposed and can well suppress highly non-stationary noise better than all the competing state-of-the-art techniques.
Speech enhancement based on deep denoising autoencoder
TLDR
Experimental results show that adding depth of the DAE consistently increase the performance when a large training data set is given, and compared with a minimum mean square error based speech enhancement algorithm, the proposed denoising DAE provided superior performance on the three objective evaluations.
Denoising deep neural networks based voice activity detection
  • Xiao-Lei Zhang, Ji Wu
  • Computer Science
    2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
TLDR
Experimental results show that the proposed denoising-deep-neural-network (DDNN) based VAD not only outperforms the DBN-based VAD but also shows an apparent performance improvement of the deep layers over shallower layers.
An investigation of deep neural networks for noise robust speech recognition
TLDR
The noise robustness of DNN-based acoustic models can match state-of-the-art performance on the Aurora 4 task without any explicit noise compensation and can be further improved by incorporating information about the environment into DNN training using a new method called noise-aware training.
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
TLDR
A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization
TLDR
This paper proposes a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF), and compares the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures.
...
...