• Corpus ID: 8581960

Analysis of i-vector Length Normalization in Speaker Recognition Systems

@inproceedings{GarciaRomero2011AnalysisOI,
  title={Analysis of i-vector Length Normalization in Speaker Recognition Systems},
  author={Daniel Garcia-Romero and Carol Y. Espy-Wilson},
  booktitle={INTERSPEECH},
  year={2011}
}
We present a method to boost the performance of probabilistic generative models that work with i-vector representations. [] Key Method This non-linear transformation allows the use of probabilistic models with Gaussian assumptions that yield equivalent performance to that of more complicated systems based on Heavy-Tailed assumptions. Significant performance improvements are demonstrated on the telephone portion of NIST SRE 2010.

Figures and Tables from this paper

Identify the Benefits of the Different Steps in an i-Vector Based Speaker Verification System
TLDR
This paper focuses on the analysis of the i-vector paradigm, a compact representation of spoken utterances that is used by most of the state of the art speaker verification systems, especially their ability to model data according to a theoretical Gaussian framework.
Nonlinear I-Vector Transformations for PLDA-Based Speaker Recognition
TLDR
This paper proposes to transform the i-vectors so that their distribution becomes more suitable to discriminate speakers using the PLDA model by means of a sequence of affine and nonlinear transformations whose parameters are obtained by maximum likelihood estimation on the development set.
Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space
TLDR
A full-covariance Gaussian modeling of the clean i-vectors and noise distributions in the i-vesctors space is proposed and a technique to estimate a clean i'-vector given the noisy version and the noise density function using MAP approach is introduced.
Minimax i-vector extractor for short duration speaker verification
TLDR
This study proposes to use a minimax strategy to estimate the sufficient statistics in order to increase the robustness of the extracted i-vectors and shows by experiments that the proposed minimax technique can improve over the baseline system from 9.89% to 7.99% on the NIST SRE 2010 8conv-10sec task.
Discriminatively trained Bayesian speaker comparison of i-vectors
  • B. J. Borgstrom, A. McCree
  • Computer Science
    2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
TLDR
This framework for fully Bayesian speaker comparison of i-vectors is shown to be mathematically equivalent to probabilistic linear discriminant analysis (PLDA) and discriminative training of model hyper-parameters by minimizing the total cross entropy between LLRs and class labels.
Reducing Noise Bias in the i-Vector Space for Speaker Recognition
TLDR
Although originally designed for addressing additive noise, it is shown that under certain circumstances the proposed method incidentally alleviates convolutive nuisance as well.
An i-vector backend for speaker verification
TLDR
This work proposes a new approach to the problem of uncertainty modeling in text-dependent speaker verification where speaker factors are used as the feature representation and develops a version of this backend that works with Baum-Welch statistics instead of point estimates.
Effect of multicondition training on i-vector PLDA configurations for speaker recognition
TLDR
This study indicates that multicondition training of the PLDA model, and if possible the enrollment i-vectors are the most important to achieve good performance in noisy evaluation data.
Dealing with additive noise in speaker recognition systems based on i-vector approach
TLDR
A statistical framework allowing to estimate aclean i-vector given the noisy one or to integrate, directly, statistical knowledges about the noise and clean i-vectors in the scoring phase is described.
I-vector transformation and scaling for PLDA based speaker recognition
TLDR
The i-vectors are transformed, extracted ignoring the classifier that will be used, so that their distribution becomes more suitable to discriminate speakers using PLDA, by means of a sequence of affine and non-linear transformations whose parameters are obtained by Maximum Likelihood (ML) estimation on the training set.
...
...

References

SHOWING 1-10 OF 11 REFERENCES
The speaker partitioning problem
We give a unification of several different speaker recognition problems in terms of the general speaker partitioning problem, where a set of N inputs has to be partitioned into subsets according to
Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification
TLDR
The use of universal background models (UBM) with full-covariance matrices is suggested and thoroughly experimentally tested and dimensionality reduction of i-vectors before entering the PLDA-HT modeling is investigated.
Bayesian Speaker Verification with Heavy-Tailed Priors
TLDR
A new approach to speaker verification is described which is based on a generative model of speaker and channel effects but differs from Joint Factor Analysis in several respects, including each utterance is represented by a low dimensional feature vector rather than by a high dimensional set of Baum-Welch statistics.
Front-End Factor Analysis for Speaker Verification
TLDR
An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification
TLDR
A new speaker verification system architecture based on Joint Factor Analysis (JFA) as feature extractor is presented, using the use of the cosine kernel in the new total factor space to design two different systems: the first system is Support Vector Machines based, and the second one uses directly this kernel as a decision score.
Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms
We give a full account of the algorithms needed to carry out a joint factor analysis of speaker and session variability in a training set in which each speaker is recorded over many different
Probabilistic Linear Discriminant Analysis for Inferences About Identity
  • S. Prince, J. Elder
  • Computer Science
    2007 IEEE 11th International Conference on Computer Vision
  • 2007
TLDR
This paper describes face data as resulting from a generative model which incorporates both within- individual and between-individual variation, and calculates the likelihood that the differences between face images are entirely due to within-individual variability.
Nonlinear Extraction of Independent Components of Natural Images Using Radial Gaussianization
TLDR
It is shown that distributions of spatially proximal bandpass filter responses are better described as elliptical than as linearly transformed independent sources, and it is demonstrated that the reduction in dependency achieved by applying RG to either nearby pairs or blocks of bandpass filters is significantly greater than that achieved by ICA.
Pattern Recognition and Machine Learning
TLDR
This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
...
...