Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition
@article{Peng2022LabelfreeKD, title={Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition}, author={Zhiyuan Peng and Xuanji He and Ke Ding and Tan Lee and Guanglu Wan}, journal={2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)}, year={2022}, pages={324-328} }
Very deep models for speaker recognition (SR) have demonstrated remarkable performance improvement in recent research. However, it is impractical to deploy these models for on-device applications with constrained computational resources. On the other hand, light-weight models are highly desired in practice despite their sub-optimal performance. This research aims to improve light-weight SR models through large-scale label-free knowledge distillation (KD). Existing KD approaches for SR typically…
References
SHOWING 1-10 OF 40 REFERENCES
Knowledge Distillation for Small Foot-print Deep Speaker Embedding
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
Results show that the proposed knowledge distillation methods can significantly boost the performance of highly compact student models.
Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification
- Computer ScienceICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2022
The limits of speech representations learned by different self-supervised objectives and datasets for automatic speaker verification (ASV) are explored, especially with a well-recognized SOTA ASV model, ECAPA-TDNN, as a downstream model.
Learning Speaker Embedding with Momentum Contrast
- Computer ScienceArXiv
- 2020
Comparative study confirms the effectiveness of MoCo learning good speaker embedding and finetuning on the MoCo trained model reduces the equal error rate (EER) compared to a carefully tuned baseline training from scratch.
X-Vectors: Robust DNN Embeddings for Speaker Recognition
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.
Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning
- Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
A simple contrastive learning approach (SimCLR) with a momentum contrastive (MoCo) learning framework, where the MoCo speaker embedding system utilizes a queue to maintain a large set of negative examples, is examined.
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
- Computer ScienceNeurIPS
- 2020
We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being…
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
- Computer ScienceIEEE Journal of Selected Topics in Signal Processing
- 2022
A new pre-trained model, WavLM, is proposed, to solve full-stack downstream speech tasks and achieves state-of-the-art performance on the SUPERB benchmark, and brings significant improvements for various speech processing tasks on their representative benchmarks.
Deep Normalization for Speaker Vectors
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2021
It is argued that deep speaker vectors require deep normalization, and a deepnormalization approach based on a novel discriminative normalization flow (DNF) model is proposed, which demonstrates the effectiveness of the proposed approach with experiments using the widely used SITW and CNCeleb corpora.
In defence of metric learning for speaker recognition
- Computer ScienceINTERSPEECH
- 2020
It is demonstrated that the vanilla triplet loss shows competitive performance compared to classification-based losses, and those trained with the proposed metric learning objective outperform state-of-the-art methods.
Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification
- Computer ScienceICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2022
This paper has come up with an innovative asymmetric structure, which takes the large-scale ECAPA-TDNN model for enrollment and the small-scaleECAPA -TDNNLite model for verification for verification and reduces the EER to 2.31%.