Corpus ID: 237941057

Joint magnitude estimation and phase recovery using Cyle-in-cycle GAN for non-parallel speech enhancement

  title={Joint magnitude estimation and phase recovery using Cyle-in-cycle GAN for non-parallel speech enhancement},
  author={Guochen Yu and Andong Li and Yutian Wang and Yinuo Guo and Chengshi Zheng and Hui Wang},
  • Guochen Yu, Andong Li, +3 authors Hui Wang
  • Published 26 September 2021
  • Computer Science, Engineering
  • ArXiv
For the lack of adequate paired noisy-clean speech corpus in many real scenarios, non-parallel training is a promising task for DNNbased speech enhancement methods. However, because of the severe mismatch between input and target speech, many previous studies only focus on magnitude spectrum estimation and remain the phase unaltered, resulting in the degraded speech quality under low signal-to-noise ratio conditions. To tackle this problem, we decouple the difficult target w.r.t. original… Expand

Figures and Tables from this paper


A Simultaneous Denoising and Dereverberation Framework with Target Decoupling
An integrated framework to address simultaneous denoising and dereverberation under complicated scenario environments is proposed and adopts a chain optimization strategy and designs four sub-stages accordingly. Expand
A Two-stage Complex Network using Cycle-consistent Generative Adversarial Networks for Speech Enhancement
  • Guochen Yu, Yutian Wang, Hui Wang, Qin Zhang, Chengshi Zheng
  • Computer Science, Engineering
  • Speech Communication
  • 2021
A novel two-stage denoising system that combines a CycleGAN-based magnitude enhancing network and a subsequent complex spectral refining network in this paper that consistently surpasses previous one-stage CycleGANs and other state-of-the-art SE systems in terms of various evaluation metrics, especially in background noise suppression. Expand
CycleGAN-based Non-parallel Speech Enhancement with an Adaptive Attention-in-attention Mechanism
This paper proposes an integration of adaptive time-frequency attention (ATFA) and adaptive hierarchical attention (AHA) to form an attention-inattention (AIA) module for more flexible feature learning during the mapping procedure in non-parallel speech enhancement. Expand
Dnsmos: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors
This paper introduces a multi-stage self-teaching based perceptual objective metric that is designed to evaluate noise suppressors and generalizes well in challenging test conditions with a high correlation to human ratings. Expand
Self-Attention Generative Adversarial Network for Speech Enhancement
  • Huy Phan, Huy L. Nguyen, +4 authors A. Mertins
  • Computer Science, Engineering
  • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
The experiments show that introducing self-attention to SEGAN leads to consistent improvement across the objective evaluation metrics of enhancement performance, suggesting that it can be conveniently applied at the highest-level (de)convolutional layer with the smallest memory overhead. Expand
Two Heads are Better Than One: A Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement
A novel complex spectral mapping approach with a two-stage pipeline for monaural speech enhancement in the time-frequency domain that aims to decouple the primal problem into multiple sub-problems, which achieves state-of-the-art performance over previous advanced systems under various conditions. Expand
A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network
  • Yang Xiang, C. Bao
  • Computer Science
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2020
A novel parallel-data-free speech enhancement method, in which the cycle-consistent generative adversarial network (CycleGAN) and multi-objective learning are employed, which is effective to improve speech quality and intelligibility when the networks are trained under the parallel data. Expand
A time-frequency smoothing neural network for speech enhancement
A time–frequency smoothing neural network is proposed for speech enhancement by using the long short-term memory (LSTM) and convolutional neural network (CNN) to model the correlation in the time and frequency dimensions respectively, and experimental results show that the proposed network yields better speech enhancement performance compared with the other networks. Expand
CP-GAN: Context Pyramid Generative Adversarial Network for Speech Enhancement
This work makes the first attempt to explore the global and local speech features for coarse-to-fine speech enhancement and introduces a Context Pyramid Generative Adversarial Network (CPGAN), which contains a densely-connected feature pyramid generator and a dynamic context granularity discriminator to better eliminate audio noise hierarchically. Expand
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
A new network structure simulating the complex-valued operation, called Deep Complex Convolution Recurrent Network (DCCRN), where both CNN and RNN structures can handle complex- valued operation. Expand