Factor Analysis Based Speaker Verification Using ASR

  title={Factor Analysis Based Speaker Verification Using ASR},
  author={Hang Su and Steven Wegmann},
In this paper, we propose to improve speaker verification performance by importing better posterior statistics from acoustic models trained for Automatic Speech Recognition (ASR). This approach aims to introduce state-of-the-art techniques in ASR to speaker verification task. We compare statistics collected from several ASR systems, and show that those collected from deep neural networks (DNN) trained with fMLLR features can effectively reduce equal error rate (EER) by more than 30% on NIST SRE… 

Figures and Tables from this paper

Content Normalization for Text-Dependent Speaker Verification
Improve i-vector system by normalizing the content of the enrollment data to match the test data and achieving 12% relative improvement in equal error rate over a GMM-UBM based baseline system.
Template-matching for text-dependent speaker verification
Phonetic aware techniques for Speaker Verification
This thesis proposes various techniques to exploit phonetic knowledge of speech to further enrich speaker characteristics by exploiting diverse (phonetic) information extracted using various techniques such as automatic speech recognition (ASR).
Combining Speech and Speaker Recognition - A Joint Modeling Approach
A unified model is developed that is trained jointly for speech and speaker recognition that can effectively perform ASR and SRE tasks and experiments show that the JointDNN model is more effective in speaker recognition than x-vector system, given a limited amount of training data.
Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages
This paper presents a set of experiments to investigate the impact of domain mismatch on the performance of neural LID systems for a subset of six Slavic languages across two domains and examines two low-level signal descriptors (spectral and cepstral features) for this task.


A novel scheme for speaker recognition using a phonetically-aware deep neural network
We propose a novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for
Time delay deep neural network-based universal background models for speaker recognition
This study investigates a lightweight alternative in which a supervised GMM is derived from the TDNN posteriors, which maintains the speed of the traditional unsupervised-GMM, but achieves a 20% relative improvement in EER.
Front-End Factor Analysis For Speaker Verification
  • Florin Curelaru
  • Computer Science
    2018 International Conference on Communications (COMM)
  • 2018
This paper investigates which configuration and which parameters lead to the best performance of an i-vectors/PLDA based speaker verification system and presents at the end some preliminary experiments in which the utterances comprised in the CSTR VCTK corpus were used besides utterances from MIT-MDSVC for training the total variability covariance matrix and the underlying PLDA matrices.
Bayesian Speaker Verification with Heavy-Tailed Priors
A new approach to speaker verification is described which is based on a generative model of speaker and channel effects but differs from Joint Factor Analysis in several respects, including each utterance is represented by a low dimensional feature vector rather than by a high dimensional set of Baum-Welch statistics.
Deep Neural Network Approaches to Speaker and Language Recognition
This work presents the application of single DNN for both SR and LR using the 2013 Domain Adaptation Challenge speaker recognition (DAC13) and the NIST 2011 language recognition evaluation (LRE11) benchmarks and demonstrates large gains on performance.
Linear discriminant analysis for improved large vocabulary continuous speech recognition
  • Reinhold Häb-Umbach, H. Ney
  • Computer Science
    [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 1992
The interaction of linear discriminant analysis (LDA) and a modeling approach using continuous Laplacian mixture density HMM is studied experimentally. The largest improvements in speech recognition
The speaker partitioning problem
We give a unification of several different speaker recognition problems in terms of the general speaker partitioning problem, where a set of N inputs has to be partitioned into subsets according to
Maximum likelihood linear transformations for HMM-based speech recognition
  • M. Gales
  • Computer Science
    Comput. Speech Lang.
  • 1998
The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform.
Improved vocabulary-independent sub-word HMM modelling
  • L. C. Wood, D. Pearce, F. Novello
  • Computer Science
    [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing
  • 1991
The authors describe two techniques for improving the performance of subword recognition on open vocabularies using vocabulary-independent training. The first uses a subtriphone unit called a
Boosted MMI for model and feature-space discriminative training
A modified form of the maximum mutual information (MMI) objective function which gives improved results for discriminative training by boosting the likelihoods of paths in the denominator lattice that have a higher phone error relative to the correct transcript.