Learn More
Mel Filterbank Slope (MFS) feature has been shown to consistently perform better than the conventional Mel Frequency Cepstral Co-efficients (MFCC) for speaker recognition. In this work, the issues with respect to the feature's robustness to intersession variability and large dimensionality are addressed. Short term feature warping is used to improve the(More)
This paper presents Subspace Gaussian Mixture Model (SGMM) approach employed as a probabilistic generative model to estimate speaker vector representations to be subsequently used in the speaker verification task. SGMMs have already been shown to significantly outperform traditional HMM/GMMs in Automatic Speech Recognition (ASR) applications. An extension(More)
Probabilistic Principal Component Analysis (PPCA) based low dimensional representation of speech utterances is found to be useful for speaker recognition. Although, performance of the FA (Factor Analysis)-based total variability space model is found to be superior, hyperparameter estimation procedure in PPCA is computationally efficient. In this work,(More)
Despite the superior classification ability of deep neural networks (DNN), the performance of DNN suffers when there is a mismatch between training and testing conditions. Many speaker adaptation techniques have been proposed for DNN acoustic modeling but in case of environmental robustness the progress is still limited. It is also possible to use(More)
In this paper, the Kullback-Leibler Hidden Markov Model (KL-HMMs) is applied for unsupervised diarization of speech. A general approach to speaker diarization is to split the audio into uniform segments followed by one or more iterations of clustering of the segments and resegmentation of the audio. In the Information Bottlneck (IB) approach to diarization,(More)
—This report describes implementation of the standard i-vector-PLDA framework for the Kaldi speech recognition toolkit. The current existing speaker recognition system implementation is based on the Subspace Gaussian Mixture Model (SGMM) technique although it shares many similarities with the standard implementation. In our implementation, we modified the(More)
Feature fusion is a paradigm that has found success in a number of speech related tasks. The primary objective in applying fusion is to leverage the complementary information present in the features. Conventionally, either early or late fusion is employed. Early fusion leads to large dimensional feature vectors. Further, the range of feature values for(More)
Conventional approaches to speaker diarization use short-term features such as Mel Frequency Cepstral Coefficients (MFCC). Features such as i-vectors have been used on longer segments (minimum 2.5 seconds of speech). Using i-vectors for speaker diarization has been shown to be beneficial as it models speaker information explicitly. In this paper, the(More)
The i-vector and Joint Factor Analysis (JFA) systems for text-dependent speaker verification use sufficient statistics computed from a speech utterance to estimate speaker models. These statistics average the acoustic information over the utterance thereby losing all the sequence information. In this paper, we study explicit content matching using Dynamic(More)