SEGAN: Speech Enhancement Generative Adversarial Network

@article{Pascual2017SEGANSE,
  title={SEGAN: Speech Enhancement Generative Adversarial Network},
  author={Santiago Pascual and Antonio Bonafonte and Joan Serr{\`a}},
  journal={ArXiv},
  year={2017},
  volume={abs/1703.09452},
  url={https://api.semanticscholar.org/CorpusID:12054873}
}
This work proposes the use of generative adversarial networks for speech enhancement, and operates at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.

Figures and Tables from this paper

Multi-scale Generative Adversarial Networks for Speech Enhancement

Speech Enhancement Multi-scale Generative Adversarial Networks (SEMGAN), whose generator and discriminator networks are structured on the basis of fully convolutional neural networks (FCNNs) gain a superior performance in comparison with the optimally modified log-spectral amplitude estimator (OMLSA) and SEGAN in different noisy conditions.

Towards Generalized Speech Enhancement with Generative Adversarial Networks

This work extends a previous GAN-based speech enhancement system to deal with mixtures of four types of aggressive distortions, and proposes the addition of an adversarial acoustic regression loss that promotes a richer feature extraction at the discriminator.

Speech Enhancement via Residual Dense Generative Adversarial Network

Simulations show that the proposed speech enhancement method with a residual dense generative adversarial network contributing to map the log-power spectrum of degraded speech to the clean one can still outperform the existing GAN-based methods and masking-based method in the measures of PESQ and other evaluation indexes.

Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training

An adversarial training method to directly boost noise robustness of acoustic model and achieves the average relative error rate reduction of 23.38% and 11.54% on the development and test set, respectively.

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network

This work presents the results of adapting a speech enhancement generative adversarial network by fine-tuning the generator with small amounts of data, and investigates the minimum requirements to obtain a stable behavior in terms of several objective metrics in two very different languages.

CP-GAN: Context Pyramid Generative Adversarial Network for Speech Enhancement

This work makes the first attempt to explore the global and local speech features for coarse-to-fine speech enhancement and introduces a Context Pyramid Generative Adversarial Network (CPGAN), which contains a densely-connected feature pyramid generator and a dynamic context granularity discriminator to better eliminate audio noise hierarchically.

Time-domain Speech Enhancement with Generative Adversarial Learning

A new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN), which is an extension of the generative adversarial network in time-domain with metric evaluation to mitigate the scaling problem, and provide model training stability, thus achieving performance improvement.
...

A Regression Approach to Speech Enhancement Based on Deep Neural Networks

The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.

Speech enhancement based on deep denoising autoencoder

Experimental results show that adding depth of the DAE consistently increase the performance when a large training data set is given, and compared with a minimum mean square error based speech enhancement algorithm, the proposed denoising DAE provided superior performance on the three objective evaluations.

Improved Techniques for Training GANs

This work focuses on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic, and presents ImageNet samples with unprecedented resolution and shows that the methods enable the model to learn recognizable features of ImageNet classes.

Edinburgh Research Explorer Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech

Two different approaches for speech enhancement to train TTS systems are investigated, following conventional speech enhancement methods, and it is shown that the second approach results in larger MCEP distortion but smaller F 0 errors.

WaveNet: A Generative Model for Raw Audio

WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.

Context Encoders: Feature Learning by Inpainting

It is found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures, and can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods.

Least Squares Generative Adversarial Networks

This paper proposes the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator, and shows that minimizing the objective function of LSGAN yields minimizing the Pearson X2 divergence.

Recurrent Neural Networks for Noise Reduction in Robust ASR

This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.

Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks

This paper deals with improving speech quality in office environment where multiple stationary as well as non-stationary noises can be simultaneously present in speech and proposes several strategies based on Deep Neural Networks for speech enhancement in these scenarios.

Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR

It is demonstrated that LSTM speech enhancement, even when used 'naively' as front-end processing, delivers competitive results on the CHiME-2 speech recognition task.