Online Monaural Speech Enhancement Using Delayed Subband LSTM
@article{Li2020OnlineMS, title={Online Monaural Speech Enhancement Using Delayed Subband LSTM}, author={Xiaofei Li and Radu Horaud}, journal={ArXiv}, year={2020}, volume={abs/2005.05037} }
This paper proposes a delayed subband LSTM network for online monaural (single-channel) speech enhancement. The proposed method is developed in the short time Fourier transform (STFT) domain. Online processing requires frame-by-frame signal reception and processing. A paramount feature of the proposed method is that the same LSTM is used across frequencies, which drastically reduces the number of network parameters, the amount of training data and the computational burden. Training is performed…
14 Citations
Spectro-Temporal SubNet for Real-Time Monaural Speech Denoising and Dereverberation
- EngineeringINTERSPEECH
- 2022
This paper presents an improved subband neural network applied to joint speech denoising and dereverberation for online single-channel scenarios. Preserving the advantages of subband model (SubNet)…
DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement
- Computer ScienceInterspeech
- 2021
The model is extended to sub-band processing where the bands are split and merged by learnable neural network filters instead of engineered FIR filters, leading to a faster noise suppressor trained in an end-to-end manner and a post-processing module is adopted to further suppress the unnatural residual noise.
FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement
- Computer ScienceICASSP
- 2022
An extended single-channel real-time speech enhancement framework called FullSubNet+ with following significant improvements is proposed, which reaches the state-of-the-art (SOTA) performance and outperforms other existing speech enhancement approaches.
Single-Channel Speech Dereverberation using Subband Network with A Reverberation Time Shortening Target
- PhysicsArXiv
- 2022
This work proposes a subband network for single-channel speech dereverberation, and also a new learning target based on reverberation time shortening (RTS). In the time-frequency domain, we propose…
Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement
- Computer ScienceINTERSPEECH
- 2022
A phase-aware speech-enhancement method through estimating the magnitude and phase of a complex adaptive Wiener filter that was evaluated on the open Voice Bank+DEMAND dataset and achieved a Perceptual Evaluation of Speech Quality score of 2.85 and ShortTime Objective Intelligibility score of 0.94, which is better than the stateof-art method based on cIRM estimation during the 2020 Deep Noise Challenge.
Speech Dereverberation with a Reverberation Time Shortening Target
- PhysicsArXiv
- 2022
This work proposes a new learning target based on reverberation time shortening (RTS) for speech dereverberation. The learning target for dereverberation is usually set as the direct-path speech or…
Lightweight Full-band and Sub-band Fusion Network for Real Time Speech Enhancement
- Computer ScienceInterspeech 2022
- 2022
A lightweight full-band and sub-band fusion network, where dual-branch based architecture is employed for modeling local and global spectral pattern si-multaneously, which has achieved superior performance to other state-of-the-art ap-proaches with smaller model size and lower latency.
Speech Enhancement with Fullband-Subband Cross-Attention Network
- EngineeringINTERSPEECH
- 2022
FullSubNet has shown its promising performance on speech enhancement by utilizing both fullband and subband information. However, the relationship between fullband and subband in FullSubNet is…
Quality Enhancement of Overdub Singing Voice Recordings
- 2021
Singing enhancement aims to improve the perceived quality of a singing voice recording in various aspects. Focusing on the aspect of removing degradation such as background noise or room…
Fast FullSubNet: Accelerate Full-band and Sub-band Fusion Model for Single-channel Speech Enhancement
- Computer Science
- 2022
Experimental results show that, compared to Full sub-band speech spectra, Fast FullSubNet has only 13% computational complexity and 16% processing time, and achieves comparable or even better performance.
References
SHOWING 1-10 OF 29 REFERENCES
Audio-Noise Power Spectral Density Estimation Using Long Short-Term Memory
- Computer ScienceIEEE Signal Processing Letters
- 2019
Speaker- and speech-independent experiments with different types of noise show that the proposed method outperforms the unsupervised estimators, and it generalizes well to noise types that are not present in the training set.
Complex Ratio Masking for Monaural Speech Separation
- PhysicsIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2016
The proposed approach improves over other methods when evaluated with several objective metrics, including the perceptual evaluation of speech quality (PESQ), and a listening test where subjects prefer the proposed approach with at least a 69% rate.
Convolutional Recurrent Neural Network Based Progressive Learning for Monaural Speech Enhancement
- Computer ScienceArXiv
- 2019
This work proposes a novel progressive learning framework with causal convolutional recurrent neural networks called PL-CRNN, which takes advantage of both Convolutional neural networks and recurrent neural Networks to drastically reduce the number of parameters and simultaneously improve speech quality and speech intelligibility.
Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality of enhanced speech for real-time single-channel speech enhancement and proposes two novel mean-squared-error-based learning objectives.
Exploring Monaural Features for Classification-Based Speech Segregation
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2013
This paper expands T-F unit features to include gammatone frequency cepstral coefficients (GFCC), mel-frequency cep stral coefficients, relative spectral transform (RASTA) and perceptual linear prediction (PLP), and proposes to use a group Lasso approach to select complementary features in a principled way.
Single-channel speech separation with memory-enhanced recurrent neural networks
- Computer Science2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2014
The proposed Long Short-Term Memory recurrent neural networks are trained to predict clean speech as well as noise features from noisy speech features, and a magnitude domain soft mask is constructed from these features, which outperforms unsupervised magnitude domain spectral subtraction by a large margin in terms of source-distortion ratio.
Multiple-target deep learning for LSTM-RNN based speech enhancement
- Computer Science2017 Hands-free Speech Communications and Microphone Arrays (HSCMA)
- 2017
The proposed framework can consistently and significantly improve the objective measures for both speech quality and intelligibility and a novel multiple-target joint learning approach is designed to fully utilize this complementarity.
Long short-term memory for speaker generalization in supervised speech separation.
- Computer ScienceThe Journal of the Acoustical Society of America
- 2017
A separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech and which substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility.
A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement
- Computer ScienceINTERSPEECH
- 2018
This paper incorporates a convolutional encoderdecoder (CED) and long short-term memory (LSTM) into the CRN architecture, which leads to a causal system that is naturally suitable for real-time processing.
The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Speech Quality and Testing Framework
- Computer ScienceArXiv
- 2020
A large clean speech and noise corpus is opened for training the noise suppression models and a representative test set to real-world scenarios consisting of both synthetic and real recordings and an online subjective test framework based on ITU-T P.808 for researchers to quickly test their developments.