Speaker and Channel Factors in Text-Dependent Speaker Recognition
We tackle the problem of text-dependent speaker verification using a version of Joint Factor Analysis (JFA) in which speakerphrase variability is modeled with a factorial prior and channel variability with a subspace prior. We implemented this using Zhao and Dong’s variational Bayes algorithm, an extension of Vogt’s Gauss-Seidel method that supports UBM adaptation to the speaker and channel effects in enrollment and test utterances. We report results on the RSR2015 dataset obtained with two types of likelihood ratio and several strategies for UBM adaptation. We found that using a large UBM and decomposing JFA into a feature extractor and a simple back end classifier (in a way broadly analogous to the i-vector/PLDA cascade) gives better results than using likelihood ratios of either type to make verification decisions. This method involves no UBM adaptation other than to the lexical content of utterances and it is based on Vogt’s algorithm rather than Zhao and Dong’s. It results in an equal error rate of 0.5% on the RSR2015 evaluation set.