Srikanth R. Madikeri

Learn More
This paper presents Subspace Gaussian Mixture Model (SGMM) approach employed as a probabilistic generative model to estimate speaker vector representations to be subsequently used in the speaker verification task. SGMMs have already been shown to significantly outperform traditional HMM/GMMs in Automatic Speech Recognition (ASR) applications. An extension(More)
Despite the superior classification ability of deep neural networks (DNN), the performance of DNN suffers when there is a mismatch between training and testing conditions. Many speaker adaptation techniques have been proposed for DNN acoustic modeling but in case of environmental robustness the progress is still limited. It is also possible to use(More)
Probabilistic Principal Component Analysis (PPCA) based low dimensional representation of speech utterances is found to be useful for speaker recognition. Although, performance of the FA (Factor Analysis)-based total variability space model is found to be superior, hyperparameter estimation procedure in PPCA is computationally efficient. In this work,(More)
In this paper, a method to use SGMM speaker vectors for speaker diarization is introduced. The architecture of the Information Bottleneck (IB) based speaker diarization is utilized for this purpose. The audio for speaker diarization is split into short uniform segments. Speaker vectors are obtained from a Subspace Gaussian Mixture Model (SGMM) system(More)
In this paper, filterbank slope based features are applied to the Information Bottleneck based system for speaker diarization. The filterbank slope based features have shown promise in the context of speaker recognition systems owing to their ability to emphasize formants. Hence, it is proposed to study their use in the context of speaker diarization as(More)
In this paper, the Kullback-Leibler Hidden Markov Model (KL-HMMs) is applied for unsupervised diarization of speech. A general approach to speaker diarization is to split the audio into uniform segments followed by one or more iterations of clustering of the segments and resegmentation of the audio. In the Information Bottlneck (IB) approach to diarization,(More)
Conventional approaches to speaker diarization use short-term features such as Mel Frequency Cepstral Coefficients (MFCC). Features such as i-vectors have been used on longer segments (minimum 2.5 seconds of speech). Using i-vectors for speaker diarization has been shown to be beneficial as it models speaker information explicitly. In this paper, the(More)
—State-of-the-art speaker recognition systems suffer from significant performance loss on degraded speech conditions and acoustic mismatch between enrolment and test phases. Past international evaluation campaigns, such as the NIST Speaker Recognition Evaluation (SRE), have partly addressed these challenges in some evaluation conditions. This work aims at(More)
The aim of the domain-adaptation task for speaker verification is to exploit unlabelled target domain data by using the labelled source domain data effectively. The i-vector based Probabilistic Linear Dis-criminant Analysis (PLDA) framework approaches this task by clustering the target domain data and using each cluster as a unique speaker to estimate PLDA(More)
Performing speaker diarization while uniquely identifying the speakers in a collection of audio recordings is a challenging task. Based on our previous work on speaker diarization and linking, we developed a system for diarizing longitudinal TV show data sets based on the fusion of speaker diarization system outputs and speaker linking. Agreement between(More)