On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement

  title={On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement},
  author={Robert Rehr and Timo Gerkmann},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  • R. RehrTimo Gerkmann
  • Published 15 March 2017
  • Computer Science
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
For enhancing noisy signals, machine-learning based single-channel speech enhancement schemes exploit prior knowledge about typical speech spectral structures. To ensure a good generalization and to meet requirements in terms of computational complexity and memory consumption, certain methods restrict themselves to learning speech spectral envelopes. We refer to these approaches as machine-learning spectral envelope (MLSE)-based approaches. In this paper, we show by means of theoretical and… 

Figures from this paper

Robust Speech Enhancement Using Statistical Signal Processing and Machine-Learning

The aim of this thesis is to increase the robustness of machine-learning (ML)-based and non-ML-based single-channel speech enhancement algorithms by exploiting synergies between both approaches by using super-Gaussian estimators to suppress the background noise even if the speech PSD is overestimated.

On Speech Enhancement Under PSD Uncertainty

A novel nonlinear clean speech estimator is derived that takes into account prior knowledge about the absolute value of typical speech PSDs and provides uncertainty-aware counterparts to a number of well-known conventionalclean speech estimators such as the Wiener filter and Ephraim and Malah's amplitude estimators.

Real-Time Speech Enhancement Algorithm Based on Attention LSTM

Because traditional single-channel speech enhancement algorithms are sensitive to the environment and perform poorly, a speech enhancement algorithm based on attention-gated long short-term memory (LSTM) is proposed, which maintains high real-time performance and fast convergence speed.

Modulation-Domain Kalman Filtering for Monaural Blind Speech Denoising and Dereverberation

A monaural speech enhancement algorithm based on modulation-domain Kalman filtering to blindly track the time–frequency log-magnitude spectra of speech and reverberation and is evaluated in terms of speech quality, speech intelligibility, and dereverberation performance.

Monaural Speech Enhancement Using a Multi-Branch Temporal Convolutional Network

The extensive experimental investigation suggests that the MB-TCNs outperform the residual long short-term memory networks (ResLSTMs), temporal convolutional networks (TCNs), and the CNN networks that employ dense aggregations in terms of speech intelligibility and quality, while providing superior parameter efficiency.

Multimodal Speech Enhancement Using Burst Propagation

Experiments show that MBURST can reproduce similar mask reconstructions to the multimodal backpropagation-based baseline while demonstrating outstanding energy management, reducing the neuron rates to values up to 70% lower.

On Single-Channel Speech Enhancement and On Non-Linear Modulation-Domain Kalman Filtering

This report describes how different algorithms perform speech enhancement and the algorithms discussed in this report are addressed to researchers interested in monaural speech enhancement.

A Survey on Probabilistic Models in Human Perception and Machines

This mini review of probabilistic models in machine SP and human psychophysics focuses on audio and audio-visual processing, using examples of speech enhancement, automatic speech recognition, audio- visual cue integration, source separation, and causal inference to illustrate the basic principles of the probabilism approach.



Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization

This paper proposes a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF), and compares the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures.

MixMax Approximation as a Super-Gaussian Log-Spectral Amplitude Estimator for Speech Enhancement

This paper shows how this approximation can be used in combination with non-trained, blind speech and noise power estimators derived in the spectral domain to interpret the MixMax based clean speech estimator as a super-Gaussian log-spectral amplitude estimator.

Multiplicative Update of Auto-Regressive Gains for Codebook-Based Speech Enhancement

  • Qi HeFeng BaoC. Bao
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2017
An improved codebook-driven Wiener filter combined with the speech-presence probability is developed, so that the proposed method achieves the goal of removing the residual noise between the harmonics of noisy speech.

Spectral Domain Speech Enhancement Using HMM State-Dependent Super-Gaussian Priors

A spectral domain speech enhancement algorithm is developed, and hidden Markov model (HMM) based MMSE estimators for speech periodogram coefficients are derived under this gamma assumption in both a high uniform resolution and a reduced-resolution Mel domain.

Corpus-Based Speech Enhancement With Uncertainty Modeling and Cepstral Smoothing

A new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010 and employs a Gaussian mixture model instead of a vector quantizer in the phoneme recognition front-end is presented.

Nonnegative HMM for Babble Noise Derived From Speech HMM: Application to Speech Enhancement

Deriving a good model for multitalker babble noise can facilitate different speech processing algorithms, e.g., noise reduction, to reduce the so-called cocktail party difficulty. In the available

On Training Targets for Supervised Speech Separation

Results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics, and that masking based targets, in general, are significantly better than spectral envelope based targets.

Speech Enhancement Using Gaussian Scale Mixture Models

The proposed methods were applied to enhance speech corrupted by Gaussian noise and speech-shaped noise and effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress.

Codebook driven short-term predictor parameter estimation for speech enhancement

Experimental results show that the use of a priori information and the calculation of the instantaneous speech and noise excitation variances on a frame-by-frame basis result in good performance in both stationary and nonstationary noise conditions.

Speech enhancement based on log spectral envelope model and harmonicity-derived spectral mask, and its coupling with feature compensation

  • T. YoshiokaT. Nakatani
  • Physics
    2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2011
The key to the method is its use of a harmonic structure to define the prior distribution of a spectral mask, which is used for both accurate noise estimation and attenuation and combines log mel-frequency feature enhancement with the above method to take advantage of low dimensionality.