Adversarial Training for Multi-domain Speaker Recognition

  title={Adversarial Training for Multi-domain Speaker Recognition},
  author={Qing Wang and Wei Rao and Pengcheng Guo and Lei Xie},
  journal={2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)},
  • Qing Wang, Wei Rao, Lei Xie
  • Published 17 November 2020
  • Computer Science
  • 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)
In real-life applications, the performance of speaker recognition systems always degrades when there is a mismatch between training and evaluation data. Many domain adaptation methods have been successfully used for eliminating the domain mismatches in speaker recognition. However, usually both training and evaluation data themselves can be composed of several subsets. These inner variances of each dataset can also be considered as different domains. Different distributed subsets in source or… 

Figures and Tables from this paper


Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition
Experiments demonstrate that the proposed domain adversarial training method is not only effective in solving the dataset mismatch problem, but also outperforms the compared unsupervised domain adaptation methods.
Channel Adversarial Training for Cross-channel Text-independent Speaker Recognition
A novel deep-learning based speaker recognition framework to learn the channel-invariant and speaker-discriminative speech representations via channel adversarial training that achieves significant relative improvement in channel mismatch problem and outperforms state-of-the-art speaker recognition methods.
Variational Domain Adversarial Learning for Speaker Verification
Experiments on both SRE16 and SRE18-CMN2 show that VDANN outperforms the Kaldi baseline and the standard DANN, and results suggest that VAE regularization is effective for domain adaptation.
Cross-lingual Text-independent Speaker Verification Using Unsupervised Adversarial Discriminative Domain Adaptation
  • Wei XiaJing HuangJ. Hansen
  • Computer Science
    ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
Data analysis of ADDA adapted speaker embedding shows that the learned speaker embeddings can perform well on speaker classification for the target domain data, and are less dependent with respect to the shift in language.
Autoencoder Based Domain Adaptation for Speaker Recognition Under Insufficient Channel Information
The proposed approach combines an autoencoder with a denoising autoen coder to adapt resource-rich development dataset to test domain to exploit limited in-domain dataset effectively and shows significant improvements over baselines and results from other prior studies.
Unsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech Recognition
This study optimize the senone classifier to make its decision boundaries lie in the class boundaries of unlabeled target data, then the feature generator learns to create features far away from the decision boundaries, which are more discriminative.
Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification
Experiments on both SRE16 and SRE18-CMN2 show that the InfoVDANN outperforms the recent VDANN, which suggests that increasing the mutual information between the embedded features and input features enables theInfoVDANN to extract extra speaker information that is otherwise not possible.
Domain Mismatch Compensation for Speaker Recognition Using a Library of Whiteners
The proposed approach to domain mismatch compensation is based on a generalization of data whitening used in association with i-vector length normalization and utilizes a library of whitening transforms trained at system development time using strictly out-of-domain data.
Domain Adversarial Training for Speech Enhancement
A domain adversarial training technique for unsupervised domain transfer that overcomes domain mismatch, and provides a solution to the scenario where the authors only have noisy speech data, and they don't have clean-noisy parallel data in the new domain is proposed.
Inter dataset variability compensation for speaker recognition
  • Hagai Aronowitz
  • Computer Science
    2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
This work analyzes the sources of degradation for a particular setup in the context of an i-vector PLDA system and concludes that the main source for degradation is ani-vector dataset shift, which is introduced using the nuisance attribute projection (NAP) method.