The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.Expand

The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task.Expand

High performance speaker identification and verification systems based on Gaussian mixture speaker models: robust, statistically based representations of speaker identity, evaluated on four publically available speech databases.Expand

This work examines the idea of using the GMM supervector in a support vector machine (SVM) classifier and proposes two new SVM kernels based on distance metrics between GMM models that produce excellent classification accuracy in a NIST speaker recognition evaluation task.Expand

IEEE International Conference on Acoustics Speech…

14 May 2006

TLDR

A support vector machine kernel is constructed using the GMM supervector and similarities based on this kernel between the method of SVM nuisance attribute projection (NAP) and the recent results in latent factor analysis are shown.Expand

An introduction proposes a modular scheme of the training and test phases of a speaker verification system, and the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed.Expand

This work considers the application of SVMs to speaker and language recognition and uses a sequence kernel that compares sequences of feature vectors and produces a measure of similarity to build upon a simpler mean-squared error classifier to produce a more accurate system.Expand

Gaussian Mixture Model parameters are estimated from training data using the iterative Expectation-Maximization (EM) algorithm or Maximum A Posteriori (MAP) estimation from a well-trained prior model.Expand

Gaussian Mixture Model parameters are estimated from training data using the Expectation-Maximization (EM) algorithm or Maximum A Posteriori (MAP) estimation from a well-trained prior model.Expand

Some of the strengths and weaknesses of current speaker recognition technologies are discussed, and some potential future trends in research, development and applications are outlined.Expand