• Corpus ID: 25742153

Speech enhancement with weighted denoising auto-encoder

@inproceedings{Xia2013SpeechEW,
  title={Speech enhancement with weighted denoising auto-encoder},
  author={Bingyin Xia and Chang-chun Bao},
  booktitle={INTERSPEECH},
  year={2013}
}
A novel speech enhancement method with Weighted Denoising Auto-encoder (WDA) is proposed in this paper. [] Key Method First, the proposed WDA is used to model the relationship between the noisy and clean power spectrums of speech signal. Then, the estimated clean power spectrum is used in the a Posteriori SNR Controlled Recursive Averaging (PCRA) approach for the estimation of the a priori SNR. Finally, the enhanced speech is obtained by Wiener filter operating in the frequency domain. From the test results…

Figures and Tables from this paper

Perception Optimized Deep Denoising AutoEncoders for Speech Enhancement
TLDR
A novel objective loss function is proposed, which takes into account the perceptual quality of speech and is used to train PerceptuallyOptimized Speech Denoising Auto-Encoders (POS-DAE), and a two level DNN architecture for denoising and enhancement is introduced.
Speech Enhancement Using Convolutional Denoising Autoencoder
TLDR
In this study, a speech enhancement system is investigated using Convolutional Denoising Autoencoder (CDAE), which takes advantages from the 2D structured inputs of the features extracted from speech signals and also considers the local temporal relationship among the features.
A Mask-Based Post Processing Approach for Improving the Quality and Intelligibility of Deep Neural Network Enhanced Speech
TLDR
Objective tests show that the proposed approach always improves both speech quality and intelligibility, and it outperforms a corresponding baseline system in both matched and mismatched noise conditions.
Speech Enhancement Based on Cepstral Mapping and Deep Neural Networks
  • Yang Xiang, C. Bao
  • Computer Science
    2018 IEEE 4th International Conference on Computer and Communications (ICCC)
  • 2018
TLDR
A fusion framework is proposed to acquire enhanced speech signal, which combines Cepstral feature mapping and Wiener filter, which is able to achieve the state-of-the-art performance in improving the quality and intelligibility of noisy speech.
Joint noise and mask aware training for DNN-based speech enhancement with SUB-band features
TLDR
A joint noise and mask aware training strategy for deep neural network (DNN) based speech enhancement with sub-band features and the ideal ratio mask (IRM) is verified to have a strong complementarity with dynamic noise estimation via joint aware training of DNN.
A Noise Prediction and Time-Domain Subtraction Approach to Deep Neural Network Based Speech Enhancement
Deep neural networks (DNNs) have recently been successfully applied to the speech enhancement task; however, the low signal-to-noise ratio (SNR) performance of DNN-based speech enhancement systems
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
TLDR
The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.
MONAURAL SPEECH SEPARATION USING A PHASE-AWARE DEEP DENOISING AUTO ENCODER
  • Donald S. Williamson
  • Computer Science
    2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP)
  • 2018
TLDR
The results show that the paDDAE offers improvements over traditional DDAEs in terms of objective speech quality and intelligibility.
Wiener Gain and Deep Neural Networks: A Well-Balanced Pair For Speech Enhancement
TLDR
To design a DNN architecture adjusted for the speech enhancement task, various configuration issues frequently used in DNN-based solutions, including speech representations, residual connections, and causal vs. non-causal designs are studied.
Speech Enhancement Using Residual Convolutional Neural Network
TLDR
The method proposed here uses two dimensional convolutional neural networks with residual connections which take advantage of two key facts i.e the non linear functions learned by convolution neural network and the linearity introduced due to residual networks.
...
...

References

SHOWING 1-10 OF 14 REFERENCES
Speech enhancement using a minimum mean square error short-time spectral amplitude estimator
TLDR
This paper derives a minimum mean-square error STSA estimator, based on modeling speech and noise spectral components as statistically independent Gaussian random variables, which results in a significant reduction of the noise, and provides enhanced speech with colorless residual noise.
Enhancement and bandwidth compression of noisy speech
TLDR
An overview of the variety of techniques that have been proposed for enhancement and bandwidth compression of speech degraded by additive background noise is provided to suggest a unifying framework in terms of which the relationships between these systems is more visible and which hopefully provides a structure which will suggest fruitful directions for further research.
Suppression of acoustic noise in speech using spectral subtraction
TLDR
A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.
Noise estimation by minima controlled recursive averaging for robust speech enhancement
TLDR
A minima controlled recursive averaging (MCRA) approach for noise estimation that is computationally efficient, robust with respect to the input signal-to-noise ratio (SNR) and type of underlying additive noise, and characterized by the ability to quickly follow abrupt changes in the noise spectrum.
Image Denoising and Inpainting with Deep Neural Networks
TLDR
A novel approach to low-level vision problems that combines sparse coding and deep networks pre-trained with denoising auto-encoder (DA) is presented and can automatically remove complex patterns like superimposed text from an image, rather than simple patterns like pixels missing at random.
Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion
TLDR
This work clearly establishes the value of using a denoising criterion as a tractable unsupervised objective to guide the learning of useful higher level representations.
A statistical model-based voice activity detection
TLDR
An effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences is proposed which shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.
De-noising by soft-thresholding
  • D. Donoho
  • Computer Science
    IEEE Trans. Inf. Theory
  • 1995
TLDR
The authors prove two results about this type of estimator that are unprecedented in several ways: with high probability f/spl circ/*/sub n/ is at least as smooth as f, in any of a wide variety of smoothness measures.
Extracting and composing robust features with denoising autoencoders
TLDR
This work introduces and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern.
Gradient-based learning applied to document recognition
TLDR
This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.
...
...