Learn More
—This paper presents an extension of our previous work which proposes a new speaker representation for speaker verification. In this modeling, a new low-dimensional speaker-and channel-dependent space is defined using a simple factor analysis. This space is named the total variability space because it models both speaker and channel variabilities. Two(More)
— We compare two approaches to the problem of session variability in GMM-based speaker verification, eigen-channels and joint factor analysis, on the NIST 2005 speaker recognition evaluation data. We show how the two approaches can be implemented using essentially the same software at all stages except for the enrollment of target speakers. We demonstrate(More)
—We propose a new approach to the problem of estimating the hyperparameters which define the interspeaker variability model in joint factor analysis. We tested the proposed estimation technique on the NIST 2006 speaker recognition evaluation data and obtained 10%–15% reductions in error rates on the core condition and the extended data condition (as(More)
We present the results of speaker verification experiments conducted on the NIST 2005 evaluation data using a factor analysis of speaker and session variability in 6 telephone speech corpora distributed by the Linguistic Data Consortium. We demonstrate the effectiveness of zt-norm score normalization and a new decision criterion for speaker recognition(More)
— We present a corpus-based approach to speaker verification in which maximum likelihood II criteria are used to train a large scale generative model of speaker and session variability which we call joint factor analysis. Enrolling a target speaker consists in calculating the posterior distribution of the hidden variables in the factor analysis model and(More)
We show how the factor analysis model for speaker verification can be successfully implemented using some fast approximations which result in minor degradations in accuracy and open up the possibility of training the model on very large databases such as the union of all of the Switchboard corpora. We tested our algorithms on the NIST 1999 evaluation set(More)
We discuss the limitations of the i-vector representation of speech segments in speaker recognition and explain how Joint Factor Analysis (JFA) can serve as an alternative feature extractor in a variety of ways. Building on the work of Zhao and Dong, we implemented a variational Bayes treatment of JFA which accommodates adaptation of universal background(More)
The duration of speech segments has traditionally been controlled in the NIST speaker recognition evaluations so that researchers working in this framework have been relieved of the responsibility of dealing with the duration variability that arises in practical applications. The fixed dimensional i-vector representation of speech utterances is ideal for(More)
State of the art speaker recognition systems are based on the i-vector representation of speech segments. In this paper we show how this representation can be used to perform blind speaker adaptation of hybrid DNN-HMM speech recognition system and we report excellent results on a French language audio transcription task. The implementation is very simple.(More)