• Corpus ID: 245650557

TFCN: Temporal-Frequential Convolutional Network for Single-Channel Speech Enhancement

@inproceedings{Jia2022TFCNTC,
  title={TFCN: Temporal-Frequential Convolutional Network for Single-Channel Speech Enhancement},
  author={Xupeng Jia and Dongmei Li},
  year={2022}
}
Deep learning based single-channel speech enhancement tries to train a neural network model for the prediction of clean speech signal. There are a variety of popular network structures for single-channel speech enhancement, such as TCNN, UNet, WaveNet, etc. However, these structures usually contain millions of parameters, which is an obstacle for mobile applications. In this work, we proposed a light weight neural network for speech enhancement named TFCN. It is a temporal-frequential… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 28 REFERENCES

Low-Latency Single Channel Speech Enhancement Using U-Net Convolutional Neural Networks

  • A. BulutK. Koishida
  • Computer Science
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
TLDR
This work proposes a simple but effective U-Net convolutional neural network (CNN) based architecture with skip-connections with a focus on real-time applications which require low-latency processing and investigates the trade-off between performance and overall latency of the proposed system.

Multi-Scale TCN: Exploring Better Temporal DNN Model for Causal Speech Enhancement

TLDR
The proposed TCN model not only performs better speech reconstruction ability in terms of speech quality and speech intelligibility, but also has smaller model size than that of long short-term memory (LSTM) network and the gated recurrent units (GRU) network.

A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement

TLDR
This paper incorporates a convolutional encoderdecoder (CED) and long short-term memory (LSTM) into the CRN architecture, which leads to a causal system that is naturally suitable for real-time processing.

Multiple-target deep learning for LSTM-RNN based speech enhancement

TLDR
The proposed framework can consistently and significantly improve the objective measures for both speech quality and intelligibility and a novel multiple-target joint learning approach is designed to fully utilize this complementarity.

Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation

  • Yi LuoN. Mesgarani
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2019
TLDR
A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.

An Experimental Study on Speech Enhancement Based on Deep Neural Networks

TLDR
This letter presents a regression-based speech enhancement framework using deep neural networks (DNNs) with a multiple-layer deep architecture that tends to achieve significant improvements in terms of various objective quality measures.

A Regression Approach to Speech Enhancement Based on Deep Neural Networks

TLDR
The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.

Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks

TLDR
A phase-sensitive objective function based on the signal-to-noise ratio (SNR) of the reconstructed signal is developed, and it is shown that in experiments it yields uniformly better results in terms of signal- to-distortion ratio (SDR).

Improved Speech Enhancement Using TCN with Multiple Encoder-Decoder Layers

TLDR
This work proposes to use a multilayer encoder-decoder to obtain a noise-independent representation useful for separating clean speech and noise and presents t-SNE –based analysis of the representation learned using different architectures for selecting the optimal number of encoderdecoder layers.

Densely Connected Progressive Learning for LSTM-Based Speech Enhancement

TLDR
Experimental results demonstrate that the dense structure with deeper LSTM layers can yield significant gains of speech intelligibility measure for all noise types and levels and the post-processing with more targets tends to achieve better performance.