• Corpus ID: 239016815

Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information

@article{Thienpondt2021TacklingTS,
  title={Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information},
  author={Jenthe Thienpondt and Brecht Desplanques and Kris Demuynck},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.09150}
}
This paper contains a post-challenge performance analysis on crosslingual speaker verification of the IDLab submission to the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). We show that current speaker embedding extractors consistently underestimate speaker similarity in within-speaker cross-lingual trials. Consequently, the typical training and scoring protocols do not put enough emphasis on the compensation of intra-speaker language variability. We propose two techniques to increase… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 25 REFERENCES
The IDLAB VoxCeleb Speaker Recognition Challenge 2020 System Description
TLDR
This technical report describes the IDLAB top-scoring submissions for the VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20) in the supervised and unsupervised speaker verification tracks with a large margin fine-tuning strategy.
Speaker Recognition for Multi-speaker Conversations Using X-vectors
TLDR
It is found that diarization substantially reduces error rate when there are multiple speakers, while maintaining excellent performance on single-speaker recordings.
Integrating Frequency Translational Invariance in TDNNs and Frequency Positional Information in 2D ResNets to Enhance Speaker Verification
This paper describes the IDLab submission for the textindependent task of the Short-duration Speaker Verification Challenge 2021 (SdSVC-21). This speaker verification competition focuses on short
Analysis of Score Normalization in Multilingual Speaker Recognition
TLDR
The analysis shows that the adaptive score normalization (using top scoring files per trial) selects cohorts that in 68% contain recordings from the same language and in 92% of the same gender as the enrollment and test recordings.
The SpeakIn System for VoxCeleb Speaker Recognition Challange 2021
  • Miao Zhao, Yufeng Ma, Min Liu, Minqiang Xu
  • Computer Science, Engineering
    ArXiv
  • 2021
TLDR
This report explores several parts, including data augmentation, network structures, domain-based large margin fine-tuning, and back-end refinement of the VoxCeleb Speaker Recognition Challenge 2021 submission, which is a fusion of 9 models.
VoxCeleb: A Large-Scale Speaker Identification Dataset
TLDR
This paper proposes a fully automated pipeline based on computer vision techniques to create a large scale text-independent speaker identification dataset collected 'in the wild', and shows that a CNN based architecture obtains the best performance for both identification and verification.
X-Vectors: Robust DNN Embeddings for Speaker Recognition
TLDR
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.
Comparison of Speaker Recognition Approaches for Real Applications
TLDR
This paper describes the experimental setup and the results obtained using several state-of-the-art speaker recognition classifiers, and shows that the classifiers based on i-vectors obtain the best recognition and calibration accuracy.
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
TLDR
This work presents SpecAugment, a simple data augmentation method for speech recognition that is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients) and achieves state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work.
VOXLINGUA107: A Dataset for Spoken Language Recognition
TLDR
This paper generates semi-random search phrases from language-specific Wikipedia data that are then used to retrieve videos from YouTube for 107 languages and uses the data to build language recognition models for several spoken language identification tasks.
...
1
2
3
...