• Publications
  • Influence
X-Vectors: Robust DNN Embeddings for Speaker Recognition
TLDR
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.
Analysis of i-vector Length Normalization in Speaker Recognition Systems
TLDR
The proposed approach deals with the nonGaussian behavior of i-vectors by performing a simple length normalization, which allows the use of probabilistic models with Gaussian assumptions that yield equivalent performance to that of more complicated systems based on Heavy-Tailed assumptions.
Deep Neural Network Embeddings for Text-Independent Speaker Verification
TLDR
It is found that the embeddings outperform i-vectors for short speech segments and are competitive on long duration test conditions, which are the best results reported for speaker-discriminative neural networks when trained and tested on publicly available corpora.
Speaker diarization with plda i-vector scoring and unsupervised calibration
TLDR
A system that incorporates probabilistic linear discriminant analysis (PLDA) for i-vector scoring and uses unsupervised calibration of the PLDA scores to determine the clustering stopping criterion is proposed, and it is shown that PLDA scoring outperforms the same system with cosine scoring, and that overlapping segments reduce diarization error rate (DER) as well.
Deep neural network-based speaker embeddings for end-to-end speaker verification
TLDR
It is shown that given a large number of training speakers, the proposed system outperforms an i-vector baseline in equal error-rate (EER) and at low miss rates.
Spoken Language Recognition using X-vectors
TLDR
This paper applies x-vectors to the task of spoken language recognition, and experiments with several variations of the x-vector framework, finding that the best performing system uses multilingual bottleneck features, data augmentation, and a discriminative Gaussian classifier.
Speaker Recognition for Multi-speaker Conversations Using X-vectors
TLDR
It is found that diarization substantially reduces error rate when there are multiple speakers, while maintaining excellent performance on single-speaker recordings.
Speaker diarization using deep neural network embeddings
TLDR
This work proposes an alternative approach for learning representations via deep neural networks to remove the i-vector extraction process from the pipeline entirely and shows that, though this approach does not respond as well to unsupervised calibration strategies as previous systems, the incorporation of well-founded speaker priors sufficiently mitigates this shortcoming.
Time delay deep neural network-based universal background models for speaker recognition
TLDR
This study investigates a lightweight alternative in which a supervised GMM is derived from the TDNN posteriors, which maintains the speed of the traditional unsupervised-GMM, but achieves a 20% relative improvement in EER.
Supervised domain adaptation for I-vector based speaker recognition
TLDR
It is observed that the adaptation of the PLDA parameters (i.e. across-class and within-class co variances) produces the largest gains, and length-normalization is also important; whereas using an indomani UBM and T matrix is not crucial.
...
...