• Corpus ID: 228012439

Fuzzy Restricted Boltzmann Machine based Probabilistic Linear Discriminant Analysis for Noise-Robust Text-Dependent Speaker Verification on Short Utterances

  title={Fuzzy Restricted Boltzmann Machine based Probabilistic Linear Discriminant Analysis for Noise-Robust Text-Dependent Speaker Verification on Short Utterances},
  author={Sung-Hyun Yoon and Min-Sung Koh and Ha-jin Yu},
In the i-vector-based speaker verification system, it is important to compensate for session variability on the ivector to improve speaker verification performance. Linear discriminant analysis (LDA) is widely used to compensate for session variability by reducing the dimensionality of the i-vector. Restricted Boltzmann machine (RBM)-based probabilistic linear discriminant analysis (PLDA) has been proposed to improve the session variability compensation ability of LDA. It can be viewed as a… 
1 Citations

Figures and Tables from this paper

Regularized Within-Class Precision Matrix Based PLDA in Text-Dependent Speaker Verification
A method to improve the conventional PLDA by estimating the PLDA model using the regularized within-class precision matrix using the graphical least absolute shrinking and selection operator (GLASSO) for the regularization.


Front-End Factor Analysis for Speaker Verification
An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
i-vector Based Speaker Recognition on Short Utterances
This paper explores how the recent technologies focused around total variability modeling behave when training and testing utterance lengths are reduced, and results are presented which provide a comparison of Joint Factor Analysis and i-vector based systems including various compensation techniques.
Discriminative and generative approaches for long- and short-term speaker characteristics modeling: application to speaker verification
This thesis attempts to overcome difficulties in the area of speaker verification by proposing to combine support vector machines with two generative approaches based on Gaussian mixture models and presents a new approach to modeling the speaker's long-term prosodic and spectral characteristics.
Speaker and Session Variability in GMM-Based Speaker Verification
We present a corpus-based approach to speaker verification in which maximum-likelihood II criteria are used to train a large-scale generative model of speaker and session variability which we call…
PLDA using Gaussian Restricted Boltzmann Machines with application to Speaker Verification
A novel approach to supervised dimensionality reduction is introduced, based on Gaussian Restricted Boltzmann Machines, which attained a significant improvement compared to the Fisher’s Discriminant LDA projection using less than half of the number of eigenvectors required by LDA.
Bayesian Speaker Verification with Heavy-Tailed Priors
A new approach to speaker verification is described which is based on a generative model of speaker and channel effects but differs from Joint Factor Analysis in several respects, including each utterance is represented by a low dimensional feature vector rather than by a high dimensional set of Baum-Welch statistics.
Towards PLDA-RBM based speaker recognition in mobile environment: Designing stacked/deep PLDA-RBM systems
A PLDA-alike approach with restricted Boltzmann machines for i-vector based speaker recognition: two deep architectures are presented and examined, which aim at suppressing channel effects and recovering speaker-discriminative information on back-ends trained on a small dataset.
Speaker Verification Using Adapted Gaussian Mixture Models
The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.
I-vectors in the context of phonetically-constrained short utterances for speaker verification
This study suggests that WCCN is more robust to data mismatch but less efficient than EFR when the development data has a better match with the test data, and compares two methods, Within Class Covariance Normalization (WCCN) and Eigen Factor Radial (EFR).
Speaker Recognition Using Wavelet Packet Entropy, I-Vector, and Cosine Distance Scoring
The results of the experiments show that the proposed model can obtain good performance in clear and noisy environment and be insensitive to the low-quality speech, but the time cost of the model is high.