HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
@inproceedings{Su2020HiFiGANHD, title={HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks}, author={Jiaqi Su and Zeyu Jin and A. Finkelstein}, booktitle={INTERSPEECH}, year={2020} }
Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to transform recorded speech to sound as though it had been recorded in a studio. We use an end-to-end feed-forward WaveNet architecture, trained with multi-scale adversarial discriminators in both the time domain and the time-frequency domain. It relies on the deep feature matching losses of the discriminators to improve the… Expand
10 Citations
Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model
- Computer Science, Engineering
- 2021 IEEE Spoken Language Technology Workshop (SLT)
- 2021
- PDF
Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains
- Engineering, Computer Science
- ArXiv
- 2020
- PDF
High Fidelity Speech Regeneration with Application to Speech Enhancement
- Computer Science, Engineering
- ArXiv
- 2021
- 1
- PDF
Perceptual Loss based Speech Denoising with an ensemble of Audio Pattern Recognition and Self-Supervised Models
- Computer Science, Engineering
- ArXiv
- 2020
- 1
- PDF
CDPAM: Contrastive learning for perceptual audio similarity
- Computer Science, Engineering
- ArXiv
- 2021
- PDF
MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement
- Computer Science, Engineering
- ArXiv
- 2021
- PDF
All for One and One for All: Improving Music Separation by Bridging Networks
- Computer Science, Engineering
- ArXiv
- 2020
- PDF
Improving Perceptual Quality by Phone-Fortified Perceptual Loss for Speech Enhancement
- Computer Science, Engineering
- ArXiv
- 2020
- Highly Influenced
- PDF
Context-Aware Prosody Correction for Text-Based Speech Editing
- Engineering, Computer Science
- ArXiv
- 2021
- PDF
References
SHOWING 1-10 OF 43 REFERENCES
SEGAN: Speech Enhancement Generative Adversarial Network
- Computer Science, Mathematics
- INTERSPEECH
- 2017
- 532
- PDF
Towards Generalized Speech Enhancement with Generative Adversarial Networks
- Computer Science, Engineering
- INTERSPEECH
- 2019
- 10
- PDF
Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition
- Computer Science, Engineering
- 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
- 104
- PDF
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
- Computer Science, Engineering
- NeurIPS
- 2019
- 134
- PDF
Perceptually-motivated Environment-specific Speech Enhancement
- Computer Science
- ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
- 5
- PDF
Impulse Response Data Augmentation and Deep Neural Networks for Blind Room Acoustic Parameter Estimation
- Computer Science
- ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
- 9
- PDF
Learning Spectral Mapping for Speech Dereverberation and Denoising
- Computer Science
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2015
- 105
- PDF
Improving GANs for Speech Enhancement
- Computer Science, Engineering
- IEEE Signal Processing Letters
- 2020
- 12
- PDF