SEGAN: Speech Enhancement Generative Adversarial Network
@article{Pascual2017SEGANSE,
title={SEGAN: Speech Enhancement Generative Adversarial Network},
author={Santiago Pascual and Antonio Bonafonte and Joan Serr{\`a}},
journal={ArXiv},
year={2017},
volume={abs/1703.09452},
url={https://api.semanticscholar.org/CorpusID:12054873}
}This work proposes the use of generative adversarial networks for speech enhancement, and operates at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.
Topics
Speech Enhancement GAN (opens in a new tab)Speech Enhancement (opens in a new tab)Composite Measure For Signal (opens in a new tab)Voice Bank Corpus (opens in a new tab)DEMAND Database (opens in a new tab)Segmental Signal-to-noise Ratio (opens in a new tab)Waveform Level (opens in a new tab)PReLUs (opens in a new tab)Speech Quality (opens in a new tab)Enhanced Speech (opens in a new tab)
1,095 Citations
Time-domain speech enhancement using generative adversarial networks
- 2019
Computer Science
Multi-scale Generative Adversarial Networks for Speech Enhancement
- 2019
Computer Science
Speech Enhancement Multi-scale Generative Adversarial Networks (SEMGAN), whose generator and discriminator networks are structured on the basis of fully convolutional neural networks (FCNNs) gain a superior performance in comparison with the optimally modified log-spectral amplitude estimator (OMLSA) and SEGAN in different noisy conditions.
Towards Generalized Speech Enhancement with Generative Adversarial Networks
- 2019
Computer Science
This work extends a previous GAN-based speech enhancement system to deal with mixtures of four types of aggressive distortions, and proposes the addition of an adversarial acoustic regression loss that promotes a richer feature extraction at the discriminator.
Speech Enhancement via Residual Dense Generative Adversarial Network
- 2021
Computer Science
Simulations show that the proposed speech enhancement method with a residual dense generative adversarial network contributing to map the log-power spectrum of degraded speech to the clean one can still outperform the existing GAN-based methods and masking-based method in the measures of PESQ and other evaluation indexes.
Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training
- 2018
Computer Science, Engineering
An adversarial training method to directly boost noise robustness of acoustic model and achieves the average relative error rate reduction of 23.38% and 11.54% on the development and test set, respectively.
Language and Noise Transfer in Speech Enhancement Generative Adversarial Network
- 2018
Computer Science, Linguistics
This work presents the results of adapting a speech enhancement generative adversarial network by fine-tuning the generator with small amounts of data, and investigates the minimum requirements to obtain a stable behavior in terms of several objective metrics in two very different languages.
CP-GAN: Context Pyramid Generative Adversarial Network for Speech Enhancement
- 2020
Computer Science
This work makes the first attempt to explore the global and local speech features for coarse-to-fine speech enhancement and introduces a Context Pyramid Generative Adversarial Network (CPGAN), which contains a densely-connected feature pyramid generator and a dynamic context granularity discriminator to better eliminate audio noise hierarchically.
Time-domain Speech Enhancement with Generative Adversarial Learning
- 2021
Computer Science
A new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN), which is an extension of the generative adversarial network in time-domain with metric evaluation to mitigate the scaling problem, and provide model training stability, thus achieving performance improvement.
Data augmentation using generative adversarial networks for robust speech recognition
- 2019
Computer Science, Engineering
37 References
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
- 2015
Computer Science
The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.
Speech enhancement based on deep denoising autoencoder
- 2013
Computer Science
Experimental results show that adding depth of the DAE consistently increase the performance when a large training data set is given, and compared with a minimum mean square error based speech enhancement algorithm, the proposed denoising DAE provided superior performance on the three objective evaluations.
Improved Techniques for Training GANs
- 2016
Computer Science
This work focuses on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic, and presents ImageNet samples with unprecedented resolution and shows that the methods enable the model to learn recognizable features of ImageNet classes.
Edinburgh Research Explorer Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech
Computer Science
Two different approaches for speech enhancement to train TTS systems are investigated, following conventional speech enhancement methods, and it is shown that the second approach results in larger MCEP distortion but smaller F 0 errors.
WaveNet: A Generative Model for Raw Audio
- 2016
Computer Science
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.
Context Encoders: Feature Learning by Inpainting
- 2016
Computer Science
It is found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures, and can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods.
Least Squares Generative Adversarial Networks
- 2017
Computer Science, Mathematics
This paper proposes the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator, and shows that minimizing the objective function of LSGAN yields minimizing the Pearson X2 divergence.
Recurrent Neural Networks for Noise Reduction in Robust ASR
- 2012
Computer Science, Engineering
This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.
Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks
- 2016
Computer Science, Engineering
This paper deals with improving speech quality in office environment where multiple stationary as well as non-stationary noises can be simultaneously present in speech and proposes several strategies based on Deep Neural Networks for speech enhancement in these scenarios.
Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR
- 2015
Computer Science, Engineering
It is demonstrated that LSTM speech enhancement, even when used 'naively' as front-end processing, delivers competitive results on the CHiME-2 speech recognition task.




