On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement
@article{Rehr2017OnTI, title={On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement}, author={Robert Rehr and Timo Gerkmann}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, year={2017}, volume={26}, pages={357-366} }
For enhancing noisy signals, machine-learning based single-channel speech enhancement schemes exploit prior knowledge about typical speech spectral structures. To ensure a good generalization and to meet requirements in terms of computational complexity and memory consumption, certain methods restrict themselves to learning speech spectral envelopes. We refer to these approaches as machine-learning spectral envelope (MLSE)-based approaches. In this paper, we show by means of theoretical and…
16 Citations
Robust Speech Enhancement Using Statistical Signal Processing and Machine-Learning
- Computer Science
- 2019
The aim of this thesis is to increase the robustness of machine-learning (ML)-based and non-ML-based single-channel speech enhancement algorithms by exploiting synergies between both approaches by using super-Gaussian estimators to suppress the background noise even if the speech PSD is overestimated.
On Speech Enhancement Under PSD Uncertainty
- EngineeringIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2018
A novel nonlinear clean speech estimator is derived that takes into account prior knowledge about the absolute value of typical speech PSDs and provides uncertainty-aware counterparts to a number of well-known conventionalclean speech estimators such as the Wiener filter and Ephraim and Malah's amplitude estimators.
Real-Time Speech Enhancement Algorithm Based on Attention LSTM
- Computer ScienceIEEE Access
- 2020
Because traditional single-channel speech enhancement algorithms are sensitive to the environment and perform poorly, a speech enhancement algorithm based on attention-gated long short-term memory (LSTM) is proposed, which maintains high real-time performance and fast convergence speed.
Monaural Speech Enhancement Using a Multi-Branch Temporal Convolutional Network
- Computer ScienceSSRN Electronic Journal
- 2022
The extensive experimental investigation suggests that the MB-TCNs outperform the residual long short-term memory networks (ResLSTMs), temporal convolutional networks (TCNs), and the CNN networks that employ dense aggregations in terms of speech intelligibility and quality, while providing superior parameter efficiency.
Multimodal Speech Enhancement Using Burst Propagation
- Computer ScienceArXiv
- 2022
Experiments show that MBURST can reproduce similar mask reconstructions to the multimodal backpropagation-based baseline while demonstrating outstanding energy management, reducing the neuron rates to values up to 70% lower.
A Survey on Probabilistic Models in Human Perception and Machines
- Computer ScienceFrontiers in Robotics and AI
- 2020
This mini review of probabilistic models in machine SP and human psychophysics focuses on audio and audio-visual processing, using examples of speech enhancement, automatic speech recognition, audio- visual cue integration, source separation, and causal inference to illustrate the basic principles of the probabilism approach.
Blind and Semi-blind Anechoic Mixing System Identification Using Multichannel Matching Pursuit
- EngineeringCircuits, Systems, and Signal Processing
- 2021
A new procedure for estimating the mixing system parameters (attenuations and delays), which can be applied to more than two mixtures and is not restricted to non-negative attenuation coefficients, is presented.
Comprehensive Review of Various Speech Enhancement Techniques
- Computer Science
- 2019
In this paper, review of various speech enhancement algorithms has been carried out in a comprehensive manner because speech related applications are incomplete without enhancing the speech quality and intelligibility.
Emotion Recognition from Speech Signals Using DCNN with Hybrid GA-GWO Algorithm
- Computer ScienceMultimedia Research
- 2019
A Deep Convolutional Neural Network model is proposed that is the combination of both the GA and GWO technique features towards training the network, and the hybrid Genetic Algorithm-Grey Wolf Optimization algorithm is presented.
Adaptive Weiner filtering with AR-GWO based optimized fuzzy wavelet neural network for enhanced speech enhancement
- Computer ScienceMultimedia tools and applications
- 2022
An Adaptive Randomized Grey Wolf Optimization (AR-GWO) is proposed for proper tuning of the tuning factor η referred as tuned tuning factor (ηtuned) in Wiener filter, the improved version of standard Grey wolf optimization (GWO).
References
SHOWING 1-10 OF 61 REFERENCES
Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2013
This paper proposes a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF), and compares the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures.
MixMax Approximation as a Super-Gaussian Log-Spectral Amplitude Estimator for Speech Enhancement
- Computer ScienceINTERSPEECH
- 2017
This paper shows how this approximation can be used in combination with non-trained, blind speech and noise power estimators derived in the spectral domain to interpret the MixMax based clean speech estimator as a super-Gaussian log-spectral amplitude estimator.
Multiplicative Update of Auto-Regressive Gains for Codebook-Based Speech Enhancement
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2017
An improved codebook-driven Wiener filter combined with the speech-presence probability is developed, so that the proposed method achieves the goal of removing the residual noise between the harmonics of noisy speech.
Nonnegative HMM for Babble Noise Derived From Speech HMM: Application to Speech Enhancement
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2013
Deriving a good model for multitalker babble noise can facilitate different speech processing algorithms, e.g., noise reduction, to reduce the so-called cocktail party difficulty. In the available…
On Training Targets for Supervised Speech Separation
- Computer Science, PhysicsIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2014
Results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics, and that masking based targets, in general, are significantly better than spectral envelope based targets.
Speech Enhancement Using Gaussian Scale Mixture Models
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2010
The proposed methods were applied to enhance speech corrupted by Gaussian noise and speech-shaped noise and effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress.
Codebook driven short-term predictor parameter estimation for speech enhancement
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2006
Experimental results show that the use of a priori information and the calculation of the instantaneous speech and noise excitation variances on a frame-by-frame basis result in good performance in both stationary and nonstationary noise conditions.
Speech enhancement based on log spectral envelope model and harmonicity-derived spectral mask, and its coupling with feature compensation
- Physics2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2011
The key to the method is its use of a harmonic structure to define the prior distribution of a spectral mask, which is used for both accurate noise estimation and attenuation and combines log mel-frequency feature enhancement with the above method to take advantage of low dimensionality.
Speech enhancement based on minimum mean-square error estimation and supergaussian priors
- Computer ScienceIEEE Transactions on Speech and Audio Processing
- 2005
Compared to algorithms based on the Gaussian assumption, such as the Wiener filter or the Ephraim and Malah (1984) MMSE short-time spectral amplitude estimator, the estimators based on these supergaussian densities deliver an improved signal-to-noise ratio.
Analysis of the Decision-Directed SNR Estimator for Speech Enhancement With Respect to Low-SNR and Transient Conditions
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2011
A systematic analysis of the performance of noise reduction algorithms in low signal-to-noise ratio (SNR) and transient conditions, where it is illustrated that achieving both a good preservation of speech onsets in transient conditions on one side and the suppression of musical noise on the other can be especially problematic when the decision-directed SNR estimation is used.