# A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder

@inproceedings{Xiang2022ABP,
title={A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder},
author={Yang Xiang and Jesper Lisby H{\o}jvang and Morten H{\o}jfeldt Rasmussen and Mads Gr{\ae}sb{\o}ll Christensen},
booktitle={ICASSP},
year={2022}
}
• Published in ICASSP 24 January 2022
• Computer Science
Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAEbased SE methods only apply VAE to model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these VAE-based methods cannot disentangle the speech and noise latent variables from the observed signal…
1 Citations

## Figures and Tables from this paper

A deep representation learning speech enhancement method using $\beta$-VAE
• Computer Science
• 2022
The proposed β -VAE strategy can be used to optimize the DNN’s structure and acquire better speech and noise latent representation than PVAE, and obtains a higher scale-invariant signal-to- distortion ratio, speech quality, and speech intelligibility.

## References

SHOWING 1-10 OF 30 REFERENCES
A VARIANCE MODELING FRAMEWORK BASED ON VARIATIONAL AUTOENCODERS FOR SPEECH ENHANCEMENT
• Computer Science
2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP)
• 2018
A Monte Carlo expectation-maximization algorithm for inferring the latent variables in the variational autoencoder and estimating the unsupervised model parameters is developed and shows that the proposed method outperforms a semi-supervised NMF baseline and a state-of-the-art fully supervised deep learning approach.
Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder
• Computer Science
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
• 2021
It is shown that the proposed noise-aware VAE outperforms the standard VAE in terms of overall distortion without increasing the number of model parameters, and is capable of generalizing to unseen noise conditions better than a supervised feedforward deep neural network (DNN).
Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization
• Computer Science
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
• 2018
This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech that outperformed the conventional DNN-based method in unseen noisy environments.
Guided Variational Autoencoder for Speech Enhancement with a Supervised Classifier
• Computer Science
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
• 2021
Provided that the label better informs the latent distribution and that the classifier achieves good performance, the proposed approach outperforms the standard variational autoencoder and a conventional neural network- based supervised approach.
Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation
• Computer Science
2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
• 2017
This paper addresses the unsupervised domain adaptation problem for robust speech recognition, where both source and target domain speech are available, but word transcripts are only available for the source domain speech.
A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network
• Computer Science
IEEE/ACM Transactions on Audio, Speech, and Language Processing
• 2020
A novel parallel-data-free speech enhancement method, in which the cycle-consistent generative adversarial network (CycleGAN) and multi-objective learning are employed, which is effective to improve speech quality and intelligibility when the networks are trained under the parallel data.
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
• Computer Science
IEEE/ACM Transactions on Audio, Speech, and Language Processing
• 2015
The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.
Semi-supervised Multichannel Speech Enhancement with Variational Autoencoders and Non-negative Matrix Factorization
• Computer Science
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
• 2019
A Monte Carlo expectation-maximization algorithm is developed and it is experimentally shown that the proposed approach outperforms its NMF-based counterpart, where speech is modeled using supervised NMF.
An NMF-HMM Speech Enhancement Method Based on Kullback-Leibler Divergence
• Computer Science
INTERSPEECH
• 2020
A novel supervised Non-negative Matrix Factorization ( NMF) speech enhancement method, which is based on Hidden Markov Model (HMM) and KullbackLeibler (KL) divergence (NMF-HMM), where the sum of Poisson is used as the observation model for each state of HMM.
Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation
• Computer Science
IEEE/ACM Transactions on Audio, Speech, and Language Processing
• 2019
A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.