Speech Denoising with Deep Feature Losses
@article{Germain2019SpeechDW, title={Speech Denoising with Deep Feature Losses}, author={François G. Germain and Qifeng Chen and Vladlen Koltun}, journal={ArXiv}, year={2019}, volume={abs/1806.10522} }
We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly. [] Key Result The advantage of the new approach is particularly pronounced for the hardest data with the most intrusive background noise, for which denoising is most needed and most challenging.
110 Citations
Multi-objective noisy-based deep feature loss for speech enhancement
- Computer ScienceSymposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA)
- 2019
This work shows that the use of only deep features in the loss function allows a significant improvement in the measurement of speech signal quality, and believes that deep-feature loss could help in the optimization of functions difficult to differentiate.
Audio Denoising with Deep Network Priors
- Computer ScienceArXiv
- 2019
A method for audio denoising that combines processing done in both the time domain and the time-frequency domain, and only trains on the specific audio clip that is being denoised.
Deep Network Perceptual Losses for Speech Denoising
- Computer ScienceArXiv
- 2020
This work first trained deep neural networks to classify either spoken words or environmental sounds from audio, then trained an audio transform to map noisy speech to an audio waveform that minimized 'perceptual' losses derived from the recognition network.
Improving deep speech denoising by Noisy2Noisy signal mapping
- Computer ScienceApplied Acoustics
- 2021
Speech Enhancement Using Deep Learning Methods: A Review
- Computer ScienceJurnal Elektronika dan Telekomunikasi
- 2021
The trend of the deep learning architecture has shifted from the standard deep neural network to convolutional neural network (CNN), which can efficiently learn temporal information of speech signal, and generative adversarial network (GAN), that utilize two networks training.
Speech Denoising with Auditory Models
- Computer ScienceInterspeech
- 2021
The results show that deep features can guide speech enhancement, but suggest that they do not yet outperform simple alternatives that do not involve learned features.
Perceptual Loss Based Speech Denoising with an Ensemble of Audio Pattern Recognition and Self-Supervised Models
- Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
A generalized framework called Perceptual Ensemble Regularization Loss (PERL) built on the idea of perceptual losses is introduced and a critical observation that state-of-the-art Multi-Task weight learning methods cannot outperform hand tuning, perhaps due to challenges of domain mismatch and weak complementarity of losses.
Speech Enhancement using the Wave-U-Net with Spectral Losses
- Physics
- 2020
Speech enhancement and source separation are related tasks that aim to extract and/or improve a signal of interest from a recording that may involve sounds from various sources, reverberation, and/or…
Speech Denoising with Residual Attention U-Net
- Computer Science
- 2020
The residual attention U-Net is proposed, which connects the same layer of multiple stacked residual channel attention encoder/decoder models for speech denoising to remove background noises from noisy, monaural speech signals by directly processing a raw waveform.
Deep speech inpainting of time-frequency masks
- Computer ScienceINTERSPEECH
- 2020
An end-to-end framework for speech inpainting, the context-based retrieval of missing or severely distorted parts of time-frequency representation of speech, based on a convolutional U-Net trained via deep feature losses obtained using speechVGG, a deep speech feature extractor pre-trained on an auxiliary word classification task.
References
SHOWING 1-10 OF 46 REFERENCES
Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech
- PhysicsSSW
- 2016
Two different approaches for speech enhancement to train TTS systems are investigated, following conventional speech enhancement methods, and show that the second approach results in larger MCEP distortion but smaller F0 errors.
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2015
The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.
A deep neural network for time-domain signal reconstruction
- Computer Science2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2015
A new deep network is proposed that directly reconstructs the time-domain clean signal through an inverse fast Fourier transform layer and significantly outperforms a recent non-negative matrix factorization based separation system in both objective speech intelligibility and quality.
Speech enhancement based on deep denoising autoencoder
- Computer ScienceINTERSPEECH
- 2013
Experimental results show that adding depth of the DAE consistently increase the performance when a large training data set is given, and compared with a minimum mean square error based speech enhancement algorithm, the proposed denoising DAE provided superior performance on the three objective evaluations.
A Wavenet for Speech Denoising
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
The proposed model adaptation retains Wavenet's powerful acoustic modeling capabilities, while significantly reducing its time-complexity by eliminating its autoregressive nature.
Raw waveform-based speech enhancement by fully convolutional networks
- Computer Science2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- 2017
The proposed fully convolutional network (FCN) model can not only effectively recover the waveforms but also outperform the LPS- based DNN baseline in terms of short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ).
Speech Enhancement Using Bayesian Wavenet
- Computer ScienceINTERSPEECH
- 2017
This paper presents a Bayesian speech enhancement framework, called BaWN (Bayesian WaveNet), which directly operates on raw audio samples and adopts the recently announced WaveNet, which is shown to be effective in modeling conditional distributions of speech samples while generating natural speech.
Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks
- Computer ScienceINTERSPEECH
- 2016
This paper deals with improving speech quality in office environment where multiple stationary as well as non-stationary noises can be simultaneously present in speech and proposes several strategies based on Deep Neural Networks for speech enhancement in these scenarios.
SEGAN: Speech Enhancement Generative Adversarial Network
- Computer ScienceINTERSPEECH
- 2017
This work proposes the use of generative adversarial networks for speech enhancement, and operates at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.
A Deep Ensemble Learning Method for Monaural Speech Separation
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2016
A deep ensemble method, named multicontext networks, is proposed to address monaural speech separation and it is found that predicting the ideal time-frequency mask is more efficient in utilizing clean training speech, while predicting clean speech is less sensitive to SNR variations.