Noise-robust voice conversion with domain adversarial training
@article{Du2022NoiserobustVC, title={Noise-robust voice conversion with domain adversarial training}, author={Hongqiang Du and Lei Xie and Haizhou Li}, journal={Neural networks : the official journal of the International Neural Network Society}, year={2022}, volume={148}, pages={ 74-84 } }
Figures and Tables from this paper
4 Citations
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers
- Computer ScienceINTERSPEECH
- 2022
A noise-independent speech representation learning approach for high-quality voice conversion for noisy target speakers using a latent feature space where it is ensured that the target distribution modeled by the conversion model is exactly from the modeled distribution of the waveform generator.
Preserving background sound in noise-robust voice conversion via multi-task learning
- Computer ScienceArXiv
- 2022
Experimental results demonstrate that the proposed end-to-end framework via multi-task learning outperforms the baseline systems while achieving comparable quality and speaker similarity to the VC models trained with clean data.
Deep MCANC: A deep learning approach to multi-channel active noise control.
- Computer ScienceNeural networks : the official journal of the International Neural Network Society
- 2023
References
SHOWING 1-10 OF 59 REFERENCES
Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations
- Computer ScienceINTERSPEECH
- 2018
An adversarial learning framework for voice conversion is proposed, with which a single model can be trained to convert the voice to many different speakers, all without parallel data, by separating the speaker characteristics from the linguistic content in speech signals.
Adversarial Feature Learning and Unsupervised Clustering Based Speech Synthesis for Found Data With Acoustic and Textual Noise
- Computer ScienceIEEE Signal Processing Letters
- 2020
An approach to build high-quality and stable seq2seq based speech synthesis system using challenging found data and a VQVAE based heuristic method to compensate erroneous linguistic feature with phonetic information learned directly from speech is proposed.
Improving robustness of one-shot voice conversion with deep discriminative speaker encoder
- Computer ScienceInterspeech
- 2021
This paper proposes a deep discriminative speaker encoder that can improve the robustness of one-shot voice conversion for unseen speakers and outperforms baseline systems in terms of speech quality and speaker similarity.
One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization
- Computer ScienceINTERSPEECH
- 2019
This paper proposed a novel one-shot VC approach which is able to perform VC by only an example utterance from source and target speaker respectively, and the source andtarget speaker do not even need to be seen during training.
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
Experimental results demonstrate that the proposed method can disentangle speaker and noise attributes even if they are correlated in the training data, and can be used to consistently synthesize clean speech for all speakers.
Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
Experiments demonstrate that the proposed domain adversarial training method is not only effective in solving the dataset mismatch problem, but also outperforms the compared unsupervised domain adaptation methods.
Variational Domain Adversarial Learning for Speaker Verification
- Computer ScienceINTERSPEECH
- 2019
Experiments on both SRE16 and SRE18-CMN2 show that VDANN outperforms the Kaldi baseline and the standard DANN, and results suggest that VAE regularization is effective for domain adaptation.
Zero-Shot Voice Style Transfer with Only Autoencoder Loss
- Computer ScienceICML
- 2019
A new style transfer scheme that involves only an autoencoder with a carefully designed bottleneck is proposed, which achieves state-of-the-art results in many-to-many voice conversion with non-parallel data and is the first to perform zero-shot voice conversion.
Optimizing Voice Conversion Network with Cycle Consistency Loss of Speaker Identity
- Computer Science2021 IEEE Spoken Language Technology Workshop (SLT)
- 2021
A novel training scheme to optimize voice conversion network with a speaker identity loss function that reduces frame-level spectral loss and introduces a cycle consistency loss that constrains the converted speech to maintain the same speaker identity as reference speech at utterance level.
One-Shot Voice Conversion For Style Transfer Based On Speaker Adaptation
- Computer ScienceICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2022
This paper proposes a one-shot voice conversion approach for style transfer based on speaker adaptation and adopts weight regularization in the adaptation process to prevent over-fitting caused by using only one utterance from target speaker as training data.