A VARIANCE MODELING FRAMEWORK BASED ON VARIATIONAL AUTOENCODERS FOR SPEECH ENHANCEMENT

@article{Leglaive2018AVM,
  title={A VARIANCE MODELING FRAMEWORK BASED ON VARIATIONAL AUTOENCODERS FOR SPEECH ENHANCEMENT},
  author={Simon Leglaive and Laurent Girin and Radu Horaud},
  journal={2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP)},
  year={2018},
  pages={1-6}
}
  • Simon Leglaive, Laurent Girin, R. Horaud
  • Published 1 September 2018
  • Computer Science, Engineering, Mathematics
  • 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP)
In this paper we address the problem of enhancing speech signals in noisy mixtures using a source separation approach. We explore the use of neural networks as an alternative to a popular speech variance model based on supervised non-negative matrix factorization (NMF). More precisely, we use a variational autoencoder as a speaker-independent supervised generative speech model, highlighting the conceptual similarities that this approach shares with its NMF-based counterpart. In order to be free… Expand
Adaptive Neural Speech Enhancement with a Denoising Variational Autoencoder
TLDR
This paper presents a neural speech enhancement method that has a statistical feedback mechanism based on a denoising variational autoencoder (VAE) that outperforms the existing mask-based and generative enhancement methods in unknown conditions. Expand
Speech Enhancement with Variational Autoencoders and Alpha-stable Distributions
TLDR
This work proposes a noise model based on alpha-stable distributions, instead of the more conventional Gaussian non-negative matrix factorization approach found in previous studies, and develops a Monte Carlo expectation-maximization algorithm for estimating the model parameters at test time. Expand
Guided Variational Autoencoder for Speech Enhancement with a Supervised Classifier
TLDR
Provided that the label better informs the latent distribution and that the classifier achieves good performance, the proposed approach outperforms the standard variational autoencoder and a conventional neural network- based supervised approach. Expand
Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders
TLDR
An unsupervised speech enhancement algorithm based on the most general form of DVAEs, that is derived from a noise model based on nonnegative matrix factorization and a variational expectation-maximization (VEM) algorithm to perform speech enhancement. Expand
Semi-supervised Multichannel Speech Enhancement with Variational Autoencoders and Non-negative Matrix Factorization
TLDR
A Monte Carlo expectation-maximization algorithm is developed and it is experimentally shown that the proposed approach outperforms its NMF-based counterpart, where speech is modeled using supervised NMF. Expand
Deep Variational Generative Models for Audio-Visual Speech Separation
TLDR
The experiments show that the proposed unsupervised VAE-based method yields better separation performance than NMF-based approaches as well as a supervised deep learning-based technique. Expand
Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder
TLDR
It is shown that the proposed noise-aware VAE outperforms the standard VAE in terms of overall distortion without increasing the number of model parameters, and is capable of generalizing to unseen noise conditions better than a supervised feedforward deep neural network (DNN). Expand
Cauchy Multichannel Speech Enhancement with a Deep Speech Prior
TLDR
A semi-supervised multichannel speech enhancement system based on a probabilistic model which assumes that both speech and noise follow the heavy-tailed multi-variate complex Cauchy distribution, which is more robust against non-stationary noise. Expand
A Recurrent Variational Autoencoder for Speech Enhancement
TLDR
A variational expectation-maximization algorithm where the encoder of the RVAE is finetuned at test time, to approximate the distribution of the latent variables given the noisy speech observations, which is shown to improve the speech enhancement results. Expand
A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders
TLDR
A variational inference method to iteratively estimate the power spectrogram of the clean speech using the en-coder of the pre-learned VAE can be used to estimate the varia-tional approximation of the true posterior distribution, using the very same assumption made to train VAEs. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 37 REFERENCES
Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization
TLDR
This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech that outperformed the conventional DNN-based method in unseen noisy environments. Expand
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
TLDR
The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods. Expand
Multichannel Audio Source Separation With Deep Neural Networks
TLDR
This article proposes a framework where deep neural networks are used to model the source spectra and combined with the classical multichannel Gaussian model to exploit the spatial information and presents its application to a speech enhancement problem. Expand
Supervised Speech Separation Based on Deep Learning: An Overview
  • Deliang Wang, Jitong Chen
  • Computer Science, Medicine
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2018
TLDR
This paper provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years, and provides a historical perspective on how advances are made. Expand
Gaussian Processes for Underdetermined Source Separation
TLDR
A general formulation of underdetermined source separation as a problem involving GP regression is proposed and framing as a GP approximation is introduced to make the GP models tractable for very large signals and it is shown that computations for regularly sampled and locally stationary GPs can be done very efficiently in the frequency domain. Expand
Probabilistic Modeling Paradigms for Audio Source Separation
TLDR
This chapter provides a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models, and discusses promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems. Expand
A neural network alternative to non-negative audio models
TLDR
A neural network that can act as an equivalent to a Non-Negative Matrix Factorization (NMF) is presented, and it is shown how it can be used to perform supervised source separation. Expand
Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures
TLDR
A sparse latent variable model that can learn sounds based on their distribution of time/ frequency energy is presented that can be used to extract known types of sounds from mixtures in two scenarios. Expand
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand
Auto-Encoding Variational Bayes
TLDR
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced. Expand
...
1
2
3
4
...