Analysis of Language Dependent Front-End for Speaker Recognition

@inproceedings{Madikeri2018AnalysisOL,
  title={Analysis of Language Dependent Front-End for Speaker Recognition},
  author={Srikanth R. Madikeri and Subhadeep Dey and Petr Motl{\'i}cek},
  booktitle={INTERSPEECH},
  year={2018}
}
In Deep Neural Network (DNN) i-vector based speaker recognition systems, acoustic models trained for Automatic Speech Recognition are employed to estimate sufficient statistics for i-vector modeling. The DNN based acoustic model is typically trained on a wellresourced language like English. In evaluation conditions where enrollment and test data are not in English, as in the NIST SRE 2016 dataset, a DNN acoustic model generalizes poorly. In such conditions, a conventional Universal Background… 

Tables from this paper

Analysis of Posterior Estimation Approaches to I-vector Extraction for Speaker Recognition
TLDR
It is shown that better alignments of speech frames can lead to superior speaker verification performance, and a direct correlation exists between senone recognition accuracy of the system generating the posterior and the performance of corresponding speaker recognition systems.
IDIAP SUBMISSION TO THE NIST SRE 2016 SPEAKER RECOGNITION EVALUATION
Idiap has made one submission to the fixed condition of the NIST SRE 2018. It consists of three systems: a gender-dependent i-vector system, a gender-independent xvector system and a
Supervised Domain Adaptation for Text-Independent Speaker Verification Using Limited Data
TLDR
It is hypothesize that low-level convolutional neural network (CNN) layers characterize domain-specific component while high-level CNN layers are domain-independent and have more discriminative power.

References

SHOWING 1-10 OF 19 REFERENCES
A novel scheme for speaker recognition using a phonetically-aware deep neural network
We propose a novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for
The IBM 2016 Speaker Recognition System
TLDR
Experimental results indicate that the nearest-neighbor discriminant analysis (NDA) approach is more effective than the traditional parametric LDA for speaker recognition, when compared to raw acoustic features.
Employment of Subspace Gaussian Mixture Models in speaker recognition
TLDR
Experimental results reveal that while i-vector system performs better on truncated 3sec to 10sec and 10 sec to 30 sec utterances, noticeable improvements are observed with SGMMs especially on full length-utterance durations.
Deep Neural Network Approaches to Speaker and Language Recognition
TLDR
This work presents the application of single DNN for both SR and LR using the 2013 Domain Adaptation Challenge speaker recognition (DAC13) and the NIST 2011 language recognition evaluation (LRE11) benchmarks and demonstrates large gains on performance.
Deep neural network based posteriors for text-dependent speaker verification
TLDR
A Deep Neural Network/Hidden Markov Model Automatic Speech Recognition (DNN/HMM ASR) system is used to extract content-related posterior probabilities and outperforms systems using Gaussian mixture model posteriors by at least 50% Equal Error Rate (EER) on the RSR2015 in content mismatch trials.
Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition
TLDR
Although the proposed i-vectors yield inferior performance compared to the standard ones, they are capable of attaining 16% relative improvement when fused with them, meaning that they carry useful complementary information about the speaker’s identity.
Exploiting sequence information for text-dependent Speaker Verification
TLDR
Dynamic time warping using speaker-informative features computed from short segments of each speech utterance, also called online i-vectors, as feature vectors is proposed and provides an improvement of 75% relative equal error rate over the best model-based SV baseline system in a content-mismatch condition.
Analysis of DNN approaches to speaker identification
This work studies the usage of the Deep Neural Network (DNN) Bottleneck (BN) features together with the traditional MFCC features in the task of i-vector-based speaker recognition. We decouple the
Deep Neural Network Embeddings for Text-Independent Speaker Verification
TLDR
It is found that the embeddings outperform i-vectors for short speech segments and are competitive on long duration test conditions, which are the best results reported for speaker-discriminative neural networks when trained and tested on publicly available corpora.
IDIAP SUBMISSION TO THE NIST SRE 2016 SPEAKER RECOGNITION EVALUATION
Idiap has made one submission to the fixed condition of the NIST SRE 2018. It consists of three systems: a gender-dependent i-vector system, a gender-independent xvector system and a
...
...