Raw waveform-based speech enhancement by fully convolutional networks
@article{Fu2017RawWS,
title={Raw waveform-based speech enhancement by fully convolutional networks},
author={Szu-Wei Fu and Yu Tsao and Xugang Lu and Hisashi Kawai},
journal={2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
year={2017},
pages={006-012},
url={https://api.semanticscholar.org/CorpusID:15088220}
}The proposed fully convolutional network (FCN) model can not only effectively recover the waveforms but also outperform the LPS- based DNN baseline in terms of short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ).
Topics
Fully Convolutional Networks (opens in a new tab)Deep Neural Networks (opens in a new tab)Convolutional Neural Network (opens in a new tab)Denoising Method (opens in a new tab)Speech Enhancement (opens in a new tab)Convolutional (opens in a new tab)Model Parameters (opens in a new tab)Perceptual Evaluation Of Speech Quality (opens in a new tab)Short-time Objective Intelligibility (opens in a new tab)
188 Citations
End-to-End Speech Enhancement Using Fully Convolutional Networks with Skip Connections
- 2019
Computer Science
A fully convolutional network with skip connections (SC-FCN) for end-to-end speech enhancement is proposed, which not only avoids fixed time-frequency transformation but also allows modelling phase information.
Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks
- 2019
Computer Science
The temporal convolutional recurrent network (TCRN), an end-to-end model that directly map noisy waveform to clean waveform, is proposed, which is able to efficiently and effectively leverage short-term ang long-term information.
A time-frequency smoothing neural network for speech enhancement
- 2020
Computer Science
Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks
- 2020
Computer Science
The experimental results confirm the outstanding denoising capability of the proposed SE systems on the three tasks and the benefits of using the residual architecture on the overall SE performance.
Speech Enhancement Based on Time Domain Parallel Full Convolutional Networks
- 2021
Computer Science
Simulation results show that the parallel time-domain full convolution network speech enhancement algorithm proposed in this paper can effectively improve speech quality.
End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement
- 2022
Computer Science, Engineering
This paper presents resource-efficient and compact neural models for end-to-end noise-robust waveform-based speech enhancement using the Convolutional Encode-Decoder and Recurrent Neural Networks in the CRN framework and demonstrates the generalization of the proposed E2E SE models across different speech datasets.
Convolutional Transformer based Local and Global Feature Learning for Speech Enhancement
- 2023
Computer Science
The proposed two-stage convolutional transformer for speech enhancement in time domain outperformed the other existing models in terms of STOI (short-time objective intelligibility), and PESQ (perceptual evaluation of the speech quality).
Speech Denoising with Deep Feature Losses
- 2019
Computer Science, Engineering
An end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly, which outperforms the state-of-the-art in objective speech quality metrics and in large-scale perceptual experiments with human listeners.
Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments
- 2023
Computer Science, Engineering
A speech enhancement algorithm that constructs the temporal envelope (TEV) in the time-frequency domain by means of an embedded convolutional neural network (CNN) and is shown to outperform a conventional TEV-based speech enhancement algorithms.
Densely Connected Network with Time-frequency Dilated Convolution for Speech Enhancement
- 2019
Computer Science
A densely connected network with time-frequency (T-F) dilated convolution for speech enhancement that improves the computational efficiency significantly but also produces satisfactory enhancement performance comparing the competing methods.
47 References
Very deep convolutional neural networks for raw waveforms
- 2017
Computer Science, Engineering
This work proposes very deep convolutional neural networks that directly use time-domain waveforms as inputs that are efficient to optimize over very long sequences, necessary for processing acoustic waveforms.
SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement
- 2016
Computer Science
Results demonstrate that CNN with the two proposed SNR-aware algorithms outperform the deep neural network counterpart in terms of standardized objective evaluations when using the same number of layers and nodes.
Convolutional neural networks for acoustic modeling of raw time signal in LVCSR
- 2015
Computer Science
It is shown that the performance gap between DNNs trained on spliced hand-crafted features and DNN's trained on raw time signal can be strongly reduced by introducing 1D-convolutional layers.
Convolutional maxout neural networks for speech separation
- 2015
Computer Science
The proposed convolutional maxout neural networks (CMNNs) to separate speech and noise by estimating the ideal ratio mask of the time-frequency units outperforms a traditional DNN-based system in both objective speech quality and intelligibility.
Complex spectrogram enhancement by convolutional neural network with multi-metrics learning
- 2017
Computer Science
A novel convolutional neural network model is proposed for complex spectrogram enhancement, namely estimating clean real and imaginary (RI) spectrograms from noisy ones, and the learning process is called multi-metrics learning (MML).
Acoustic modeling with deep neural networks using raw time signal for LVCSR
- 2014
Computer Science, Engineering
Inspired by the multi-resolutional analysis layer learned automatically from raw time signal input, the DNN is trained on a combination of multiple short-term features, illustrating how the DNN can learn from the little differences between MFCC, PLP and Gammatone features.
An Experimental Study on Speech Enhancement Based on Deep Neural Networks
- 2014
Computer Science
This letter presents a regression-based speech enhancement framework using deep neural networks (DNNs) with a multiple-layer deep architecture that tends to achieve significant improvements in terms of various objective quality measures.
SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement
- 2016
Computer Science
Experimental results demonstrate that SNRbased progressive learning can effectively improve perceptual evaluation of speech quality and short-time objective intelligibility in low SNR environments, and reduce the model parameters by 50% when compared with the DNN baseline system.
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
- 2015
Computer Science
The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.
Speech enhancement based on deep denoising autoencoder
- 2013
Computer Science
Experimental results show that adding depth of the DAE consistently increase the performance when a large training data set is given, and compared with a minimum mean square error based speech enhancement algorithm, the proposed denoising DAE provided superior performance on the three objective evaluations.





