SEGAN: Speech Enhancement Generative Adversarial Network

@inproceedings{Pascual2017SEGANSE,
  title={SEGAN: Speech Enhancement Generative Adversarial Network},
  author={Santiago Pascual and Antonio Bonafonte and Joan Serr{\`a}},
  booktitle={INTERSPEECH},
  year={2017}
}
Current speech enhancement techniques operate on the spectral domain and/or exploit some higher-level feature. [...] Key Method In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them. We evaluate the proposed model using an independent, unseen test set with two speakers and 20 alternative noise conditions. The enhanced samples confirm…Expand
Multi-scale Generative Adversarial Networks for Speech Enhancement
TLDR
Speech Enhancement Multi-scale Generative Adversarial Networks (SEMGAN), whose generator and discriminator networks are structured on the basis of fully convolutional neural networks (FCNNs) gain a superior performance in comparison with the optimally modified log-spectral amplitude estimator (OMLSA) and SEGAN in different noisy conditions. Expand
Time-domain speech enhancement using generative adversarial networks
TLDR
This work proposes a generative approach to regenerate corrupted signals into a clean version by using generative adversarial networks on the raw signal, and demonstrates the applicability of the approach for more generalized speech enhancement, where it has to regenerate voices from whispered signals. Expand
Towards Generalized Speech Enhancement with Generative Adversarial Networks
TLDR
This work extends a previous GAN-based speech enhancement system to deal with mixtures of four types of aggressive distortions, and proposes the addition of an adversarial acoustic regression loss that promotes a richer feature extraction at the discriminator. Expand
Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training
TLDR
An adversarial training method to directly boost noise robustness of acoustic model and achieves the average relative error rate reduction of 23.38% and 11.54% on the development and test set, respectively. Expand
Language and Noise Transfer in Speech Enhancement Generative Adversarial Network
TLDR
This work presents the results of adapting a speech enhancement generative adversarial network by fine-tuning the generator with small amounts of data, and investigates the minimum requirements to obtain a stable behavior in terms of several objective metrics in two very different languages. Expand
Time-domain Speech Enhancement with Generative Adversarial Learning
TLDR
A new framework called Timedomain Speech Enhancement Generative Adversarial Network (TSEGAN) is proposed, which is an extension of the generative adversarial network in time-domain with metric evaluation to mitigate the scaling problem, and provide model training stability, thus achieving performance improvement. Expand
CP-GAN: Context Pyramid Generative Adversarial Network for Speech Enhancement
TLDR
This work makes the first attempt to explore the global and local speech features for coarse-to-fine speech enhancement and introduces a Context Pyramid Generative Adversarial Network (CPGAN), which contains a densely-connected feature pyramid generator and a dynamic context granularity discriminator to better eliminate audio noise hierarchically. Expand
Data augmentation using generative adversarial networks for robust speech recognition
TLDR
The experiments show that the new data augmentation approaches can obtain the performance improvement under all noisy conditions, which including additive noise, channel distortion and reverberation, and a relative 6% to 14% WER reduction can be obtained upon an advanced acoustic model. Expand
Improving generative adversarial networks for speech enhancement through regularization of latent representations
TLDR
A new network architecture and loss function based on SEGAN are proposed for speech enhancement, called high-level GAN (HLGAN), which uses parallel noisy and clean speech signals as input in the training phase instead of only noisy speech signals, and can effectively enhance the speech signals of two low-resource languages simultaneously. Expand
Speech Enhancement via Generative Adversarial LSTM Networks
  • Yang Xiang, C. Bao
  • Computer Science
  • 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC)
  • 2018
TLDR
Experimental results indicate that the proposed novel framework to conduct speech enhancement can not only improve the quality and intelligibility of noisy speech, but also is competitive to other deep learning-based approaches. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 41 REFERENCES
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
TLDR
The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods. Expand
Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech
TLDR
Experiments show that the a-layer can effectively learn to interpolate the acoustic features between speakers, and tackle the problem of speaker interpolation by adding a new output layer (a-layer) on top of the multi-output branches. Expand
Speech enhancement based on deep denoising autoencoder
TLDR
Experimental results show that adding depth of the DAE consistently increase the performance when a large training data set is given, and compared with a minimum mean square error based speech enhancement algorithm, the proposed denoising DAE provided superior performance on the three objective evaluations. Expand
Improved Techniques for Training GANs
TLDR
This work focuses on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic, and presents ImageNet samples with unprecedented resolution and shows that the methods enable the model to learn recognizable features of ImageNet classes. Expand
Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech
TLDR
Two different approaches for speech enhancement to train TTS systems are investigated, following conventional speech enhancement methods, and show that the second approach results in larger MCEP distortion but smaller F0 errors. Expand
WaveNet: A Generative Model for Raw Audio
TLDR
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition. Expand
Context Encoders: Feature Learning by Inpainting
TLDR
It is found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures, and can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods. Expand
Least Squares Generative Adversarial Networks
TLDR
This paper proposes the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator, and shows that minimizing the objective function of LSGAN yields minimizing the Pearson X2 divergence. Expand
Generative Adversarial Nets
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and aExpand
Recurrent Neural Networks for Noise Reduction in Robust ASR
TLDR
This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly. Expand
...
1
2
3
4
5
...