A Fully Convolutional Neural Network for Speech Enhancement

@inproceedings{Park2017AFC,
  title={A Fully Convolutional Neural Network for Speech Enhancement},
  author={Se Rim Park and Jinwon Lee},
  booktitle={INTERSPEECH},
  year={2017}
}
In hearing aids, the presence of babble noise degrades hearing intelligibility of human speech greatly. However, removing the babble without creating artifacts in human speech is a challenging task in a low SNR environment. Here, we sought to solve the problem by finding a `mapping' between noisy speech spectra and clean speech spectra via supervised learning. Specifically, we propose using fully Convolutional Neural Networks, which consist of lesser number of parameters than fully connected… 

Figures and Tables from this paper

A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement
TLDR
This paper incorporates a convolutional encoderdecoder (CED) and long short-term memory (LSTM) into the CRN architecture, which leads to a causal system that is naturally suitable for real-time processing.
A Fully Convolutional Neural Network for Complex Spectrogram Processing in Speech Enhancement
TLDR
The proposed CNN consists of one-dimensional (1-d) convolution and frequency-dilated 2-d convolution, and incorporates a residual learning and skip-connection structure, and achieves a better performance with fewer parameters.
Speech Enhancement using Convolutional Neural Network with Skip Connections
TLDR
Experimental results demonstrate that the proposed CNN structure provides better denoising ability than Wiener filtering in noise reduction even when the model was tested using the data and noise set not included in the training set.
Regression-based speech enhancement by convolutional neural network
TLDR
A regression-based convolutional neural network model is proposed for speech enhancement to remove the noise on the conversations and the results are evaluated by perceptual evaluation of speech quality and short time objective intelligibility.
Separated Noise Suppression and Speech Restoration: Lstm-Based Speech Enhancement in Two Stages
TLDR
This work proposes to address the problem of speech distortions can be introduced when employing NNs trained to provide strong noise suppression by first suppressing noise and subsequently restoring speech with specifically chosen NN topologies for each of these distinct tasks.
Speech Denoising with Auditory Models
TLDR
The results show that deep features can guide speech enhancement, but suggest that they do not yet outperform simple alternatives that do not involve learned features.
Redundant Convolutional Network With Attention Mechanism For Monaural Speech Enhancement
TLDR
This study introduces an attention mechanism into the convolutional encoderdecoder model that adaptively filters channelwise feature responses by explicitly modeling attentions (on speech versus noise signals) between channels.
Gated Residual Networks with Dilated Convolutions for Supervised Speech Separation
TLDR
This work proposes a novel convolutional neural network (CNN) to deal with noise- and speaker-independent speech separation and finds that the proposed model consistently outperforms a state-of-the-art long short-term memory (LSTM) based model in terms of objective speech intelligibility and quality.
Speech Enhancement via Deep Spectrum Image Translation Network
TLDR
A novel speech enhancement approach using a deep spectrum image translation network where a deep fully convolutional network known as VGG19 is embedded at the encoder part of an image-to-image translation network, i.e. U-Net is suggested.
Speech Enhancement by Multiple Propagation through the Same Neural Network
TLDR
Previous efforts are extended and demonstrated how the multi-forward-pass speech enhancement can be successfully applied to other architectures, namely the ResBLSTM and Transformer-Net and the results show that performing speech enhancement up to five times still brings improvements to speech intelligibility, but the gain becomes smaller with each iteration.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 31 REFERENCES
Learning spectral mapping for speech dereverberation
  • Kun Han, Yuxuan Wang, Deliang Wang
  • Physics, Computer Science
    2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
TLDR
It is demonstrated that distortion caused by reverberation is substantially attenuated by the DNN whose outputs can be resynthesized to the dereverebrated speech signal.
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
TLDR
The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.
Convolutional Neural Networks for Speech Recognition
TLDR
It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.
Complex recurrent neural networks for denoising speech signals
  • K. Osako, Rita Singh, B. Raj
  • Computer Science, Engineering
    2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
  • 2015
TLDR
Noise reduction experiments on noisy speech, both with digitally added synthetic noise and real car noise, show that the proposed algorithm can recover much of the degradation caused by the noise.
Speech enhancement with weighted denoising auto-encoder
TLDR
A novel speech enhancement method with Weighted Denoising Auto-encoder (WDA) is proposed, which could achieve similar amount of noise reduction in both white and colored noise, and the distortion on the level of speech signal is smaller.
Enhancement and bandwidth compression of noisy speech
TLDR
An overview of the variety of techniques that have been proposed for enhancement and bandwidth compression of speech degraded by additive background noise is provided to suggest a unifying framework in terms of which the relationships between these systems is more visible and which hopefully provides a structure which will suggest fruitful directions for further research.
Suppression of acoustic noise in speech using spectral subtraction
TLDR
A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.
Babble Noise: Modeling, Analysis, and Applications
TLDR
This study represents effectively the first effort in developing an overall model for speech babble, and with this, contributions are made for speech system robustness in noise.
A short-time objective intelligibility measure for time-frequency weighted noisy speech
TLDR
An objective intelligibility measure is presented, which shows high correlation (rho=0.95) with the intelligibility of both noisy, and TF-weighted noisy speech, and shows significantly better performance than three other, more sophisticated, objective measures.
A signal subspace approach for speech enhancement
TLDR
The popular spectral subtraction speech enhancement approach is shown to be a signal subspace approach which is optimal in an asymptotic (large sample) linear minimum mean square error sense, assuming the signal and noise are stationary.
...
1
2
3
4
...