On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement

@article{Rehr2017OnTI,
  title={On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement},
  author={Robert Rehr and Timo Gerkmann},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year={2017},
  volume={26},
  pages={357-366}
}
  • R. RehrTimo Gerkmann
  • Published 15 March 2017
  • Computer Science
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
For enhancing noisy signals, machine-learning based single-channel speech enhancement schemes exploit prior knowledge about typical speech spectral structures. To ensure a good generalization and to meet requirements in terms of computational complexity and memory consumption, certain methods restrict themselves to learning speech spectral envelopes. We refer to these approaches as machine-learning spectral envelope (MLSE)-based approaches. In this paper, we show by means of theoretical and… 

Figures from this paper

Robust Speech Enhancement Using Statistical Signal Processing and Machine-Learning

The aim of this thesis is to increase the robustness of machine-learning (ML)-based and non-ML-based single-channel speech enhancement algorithms by exploiting synergies between both approaches by using super-Gaussian estimators to suppress the background noise even if the speech PSD is overestimated.

On Speech Enhancement Under PSD Uncertainty

A novel nonlinear clean speech estimator is derived that takes into account prior knowledge about the absolute value of typical speech PSDs and provides uncertainty-aware counterparts to a number of well-known conventionalclean speech estimators such as the Wiener filter and Ephraim and Malah's amplitude estimators.

Real-Time Speech Enhancement Algorithm Based on Attention LSTM

Because traditional single-channel speech enhancement algorithms are sensitive to the environment and perform poorly, a speech enhancement algorithm based on attention-gated long short-term memory (LSTM) is proposed, which maintains high real-time performance and fast convergence speed.

Monaural Speech Enhancement Using a Multi-Branch Temporal Convolutional Network

The extensive experimental investigation suggests that the MB-TCNs outperform the residual long short-term memory networks (ResLSTMs), temporal convolutional networks (TCNs), and the CNN networks that employ dense aggregations in terms of speech intelligibility and quality, while providing superior parameter efficiency.

Multimodal Speech Enhancement Using Burst Propagation

Experiments show that MBURST can reproduce similar mask reconstructions to the multimodal backpropagation-based baseline while demonstrating outstanding energy management, reducing the neuron rates to values up to 70% lower.

A Survey on Probabilistic Models in Human Perception and Machines

This mini review of probabilistic models in machine SP and human psychophysics focuses on audio and audio-visual processing, using examples of speech enhancement, automatic speech recognition, audio- visual cue integration, source separation, and causal inference to illustrate the basic principles of the probabilism approach.

Blind and Semi-blind Anechoic Mixing System Identification Using Multichannel Matching Pursuit

A new procedure for estimating the mixing system parameters (attenuations and delays), which can be applied to more than two mixtures and is not restricted to non-negative attenuation coefficients, is presented.

Comprehensive Review of Various Speech Enhancement Techniques

In this paper, review of various speech enhancement algorithms has been carried out in a comprehensive manner because speech related applications are incomplete without enhancing the speech quality and intelligibility.

Emotion Recognition from Speech Signals Using DCNN with Hybrid GA-GWO Algorithm

A Deep Convolutional Neural Network model is proposed that is the combination of both the GA and GWO technique features towards training the network, and the hybrid Genetic Algorithm-Grey Wolf Optimization algorithm is presented.

Adaptive Weiner filtering with AR-GWO based optimized fuzzy wavelet neural network for enhanced speech enhancement

An Adaptive Randomized Grey Wolf Optimization (AR-GWO) is proposed for proper tuning of the tuning factor η referred as tuned tuning factor (ηtuned) in Wiener filter, the improved version of standard Grey wolf optimization (GWO).

References

SHOWING 1-10 OF 61 REFERENCES

Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization

This paper proposes a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF), and compares the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures.

MixMax Approximation as a Super-Gaussian Log-Spectral Amplitude Estimator for Speech Enhancement

This paper shows how this approximation can be used in combination with non-trained, blind speech and noise power estimators derived in the spectral domain to interpret the MixMax based clean speech estimator as a super-Gaussian log-spectral amplitude estimator.

Multiplicative Update of Auto-Regressive Gains for Codebook-Based Speech Enhancement

  • Qi HeFeng BaoC. Bao
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2017
An improved codebook-driven Wiener filter combined with the speech-presence probability is developed, so that the proposed method achieves the goal of removing the residual noise between the harmonics of noisy speech.

Nonnegative HMM for Babble Noise Derived From Speech HMM: Application to Speech Enhancement

Deriving a good model for multitalker babble noise can facilitate different speech processing algorithms, e.g., noise reduction, to reduce the so-called cocktail party difficulty. In the available

On Training Targets for Supervised Speech Separation

Results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics, and that masking based targets, in general, are significantly better than spectral envelope based targets.

Speech Enhancement Using Gaussian Scale Mixture Models

The proposed methods were applied to enhance speech corrupted by Gaussian noise and speech-shaped noise and effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress.

Codebook driven short-term predictor parameter estimation for speech enhancement

Experimental results show that the use of a priori information and the calculation of the instantaneous speech and noise excitation variances on a frame-by-frame basis result in good performance in both stationary and nonstationary noise conditions.

Speech enhancement based on log spectral envelope model and harmonicity-derived spectral mask, and its coupling with feature compensation

  • T. YoshiokaT. Nakatani
  • Physics
    2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2011
The key to the method is its use of a harmonic structure to define the prior distribution of a spectral mask, which is used for both accurate noise estimation and attenuation and combines log mel-frequency feature enhancement with the above method to take advantage of low dimensionality.

Speech enhancement based on minimum mean-square error estimation and supergaussian priors

  • Rainer Martin
  • Computer Science
    IEEE Transactions on Speech and Audio Processing
  • 2005
Compared to algorithms based on the Gaussian assumption, such as the Wiener filter or the Ephraim and Malah (1984) MMSE short-time spectral amplitude estimator, the estimators based on these supergaussian densities deliver an improved signal-to-noise ratio.

Analysis of the Decision-Directed SNR Estimator for Speech Enhancement With Respect to Low-SNR and Transient Conditions

A systematic analysis of the performance of noise reduction algorithms in low signal-to-noise ratio (SNR) and transient conditions, where it is illustrated that achieving both a good preservation of speech onsets in transient conditions on one side and the suppression of musical noise on the other can be especially problematic when the decision-directed SNR estimation is used.
...