Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net
@article{Choi2020PhaseawareSS, title={Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net}, author={Hyeong-Seok Choi and Hoon Heo and Jie Hwan Lee and Kyogu Lee}, journal={ArXiv}, year={2020}, volume={abs/2006.00687} }
In this work, we tackle a denoising and dereverberation problem with a single-stage framework. Although denoising and dereverberation may be considered two separate challenging tasks, and thus, two modules are typically required for each task, we show that a single deep network can be shared to solve the two problems. To this end, we propose a new masking method called phase-aware beta-sigmoid mask (PHM), which reuses the estimated magnitude values to estimate the clean phase by respecting the…
10 Citations
Training Speech Enhancement Systems with Noisy Speech Datasets
- Computer ScienceArXiv
- 2021
This paper proposes several modifications of the loss functions, which make them robust against noisy speech targets, and proposes a noise augmentation scheme for mixture-invariant training (MixIT), which allows using it also in such scenarios.
Predicting score distribution to improve non-intrusive speech quality estimation
- Computer ScienceArXiv
- 2022
Several ways to integrate the distribution of opinion scores (e.g. variance, histogram information) to improve the MOS estimation performance are investigated to provide up to a 0.016 RMSE and 1% SRCC improvement.
HiFi-GAN-2: Studio-Quality Speech Enhancement via Generative Adversarial Networks Conditioned on Acoustic Features
- Computer Science2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- 2021
Objective and subjective evaluations show that the proposed HiFi-GAN-2 outperforms state-of-the-art baselines on both conventional denoising as well as joint dereverberation andDenoising tasks.
Transformers with Competitive Ensembles of Independent Mechanisms
- Computer ScienceArXiv
- 2021
This work proposes Transformers with Independent Mechanisms (TIM), a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention, and proposes a competition mechanism which encourages these mechanisms to specialize over time steps, and thus be more independent.
VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration
- Computer Science, PhysicsArXiv
- 2022
Both objective and subjective evaluations show that VoiceFixer is effective on severely degraded speech, such as real-world his-torical speech recordings, and a synthesis stage that generates waveform using a neural vocoder.
Deep learning in electron microscopy
- Computer ScienceMach. Learn. Sci. Technol.
- 2021
This review paper offers a practical perspective aimed at developers with limited familiarity of deep learning in electron microscopy that discusses hardware and software needed to get started with deep learning and interface with electron microscopes.
ICASSP 2021 Deep Noise Suppression Challenge
- Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
A DNS challenge special session at INTERSPEECH 2020 was organized where the open-sourced training and test datasets were opened and a subjective evaluation framework was opened and used to evaluate and select the final winners.
Interactive Speech and Noise Modeling for Speech Enhancement
- Computer ScienceAAAI
- 2021
This paper proposes a novel idea to model speech and noise simultaneously in a two-branch convolutional neural network, namely SN-Net, and designs a feature extraction module, namely residual-convolution-and-attention (RA), to capture the correlations along temporal and frequency dimensions for both the speech and the noises.
Interspeech 2021 Deep Noise Suppression Challenge
- Computer ScienceInterspeech
- 2021
In this version of the Deep Noise Suppression challenge, the training and test datasets were expanded to accommodate fullband scenarios and challenging test conditions and a reliable non-intrusive objective speech quality metric for wideband called DNSMOS was made available for participants to use during their development phase.
Sandglasset: A Light Multi-Granularity Self-Attentive Network for Time-Domain Speech Separation
- Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
This work introduces a self-attentive network with a novel sandglass-shape, namely Sandglasset, which advances the state-of-the-art (SOTA) SS performance at significantly smaller model size and computational cost.
References
SHOWING 1-10 OF 34 REFERENCES
Phase-aware Speech Enhancement with Deep Complex U-Net
- Computer ScienceICLR
- 2019
A novel loss function, weighted source-to-distortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure and achieves state-of-the-art performance in all metrics.
Two-Stage Deep Learning for Noisy-Reverberant Speech Enhancement
- PhysicsIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2019
This work proposes a two-stage strategy to enhance corrupted speech, where denoising and dereverberation are conducted sequentially using deep neural networks, and designs a new objective function that incorporates clean phase during model training to better estimate spectral magnitudes.
PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network
- Computer ScienceAAAI
- 2020
This paper proposes a phase-and-harmonics-aware deep neural network (DNN), named PHASEN, which has the ability to handle detailed phase patterns and to utilize harmonic patterns, and outperforms previous methods by a large margin on four metrics.
Channel-Attention Dense U-Net for Multichannel Speech Enhancement
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
This paper proposes Channel-Attention Dense U-Net, in which the channel-attention unit is applied recursively on feature maps at every layer of the network, enabling the network to perform non-linear beamforming.
Enhanced Time-Frequency Masking by Using Neural Networks for Monaural Source Separation in Reverberant Room Environments
- Computer Science2018 26th European Signal Processing Conference (EUSIPCO)
- 2018
The proposed enhanced time-frequency (T-F) mask to improve the separation performance outperforms the state-of-the-art methods specifically in highly reverberant and noisy room environments.
End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction
- Computer ScienceINTERSPEECH
- 2018
This paper proposes an end-to-end approach for single-channel speaker-independent multi-speaker speech separation, where time-frequency (T-F) masking, the short-time Fourier transform (STFT), and its…
Multi-Scale multi-band densenets for audio source separation
- Computer Science2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- 2017
A novel network architecture that extends the recently developed densely connected convolutional network (DenseNet) and takes advantage of long contextual information and outperforms state-of-the-art results on SiSEC 2016 competition by a large margin in terms of signal-to-distortion ratio.
PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation
- Computer ScienceINTERSPEECH
- 2018
Experimental results show that the classificationbased approach successfully recovers the phase of the target source in the discretized domain, improves signal-todistortion ratio (SDR) over the regression-based approach in both speech enhancement task and music source separation (MSS) task, and outperforms state-of-the-art MSS.
SDR – Half-baked or Well Done?
- GeologyICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
It is argued here that the signal-to-distortion ratio (SDR) implemented in the BSS_eval toolkit has generally been improperly used and abused, especially in the case of single-channel separation, resulting in misleading results.
Masking Estimation with Phase Restoration of Clean Speech for Monaural Speech Enhancement
- PhysicsINTERSPEECH
- 2019
Two T-F masks are presented to simultaneously enhance magnitude and phase of speech spectrum based on non-correlation assumption of real part and imaginary part about speech spectrum, and use them as the training target of the DNN model.