Srikanth R. Madikeri

Learn More
Mel Filterbank Slope (MFS) feature has been shown to consistently perform better than the conventional Mel Frequency Cepstral Co-efficients (MFCC) for speaker recognition. In this work, the issues with respect to the feature's robustness to intersession variability and large dimensionality are addressed. Short term feature warping is used to improve the(More)
Performing speaker diarization while uniquely identifying the speakers in a collection of audio recordings is a challenging task. Based on our previous work on speaker diarization and linking, we developed a system for diarizing longitudinal TV show data sets based on the fusion of speaker diarization system outputs and speaker linking. Agreement between(More)
This paper presents Subspace Gaussian Mixture Model (SGMM) approach employed as a probabilistic generative model to estimate speaker vector representations to be subsequently used in the speaker verification task. SGMMs have already been shown to significantly outperform traditional HMM/GMMs in Automatic Speech Recognition (ASR) applications. An extension(More)
The i-vector and Joint Factor Analysis (JFA) systems for text-dependent speaker verification use sufficient statistics computed from a speech utterance to estimate speaker models. These statistics average the acoustic information over the utterance thereby losing all the sequence information. In this paper, we study explicit content matching using Dynamic(More)
In this paper, a method to use SGMM speaker vectors for speaker diarization is introduced. The architecture of the Information Bottleneck (IB) based speaker diarization is utilized for this purpose. The audio for speaker diarization is split into short uniform segments. Speaker vectors are obtained from a Subspace Gaussian Mixture Model (SGMM) system(More)
In this paper, the Kullback-Leibler Hidden Markov Model (KL-HMMs) is applied for unsupervised diarization of speech. A general approach to speaker diarization is to split the audio into uniform segments followed by one or more iterations of clustering of the segments and resegmentation of the audio. In the Information Bottlneck (IB) approach to diarization,(More)
Despite the superior classification ability of deep neural networks (DNN), the performance of DNN suffers when there is a mismatch between training and testing conditions. Many speaker adaptation techniques have been proposed for DNN acoustic modeling but in case of environmental robustness the progress is still limited. It is also possible to use(More)
Conventional approaches to speaker diarization use short-term features such as Mel Frequency Cepstral Co-efficients (MFCC). Features such as i-vectors have been used on longer segments (minimum 2.5 seconds of speech). Using i-vectors for speaker diarization has been shown to be beneficial as it models speaker information explicitly. In this paper, the(More)
In this paper, filterbank slope based features are applied to the Information Bottleneck based system for speaker diarization. The filterbank slope based features have shown promise in the context of speaker recognition systems owing to their ability to emphasize formants. Hence, it is proposed to study their use in the context of speaker diarization as(More)