Learn More
This paper presents the Speech Technology Center (STC) systems submitted to Automatic Speaker Verification Spoofing and Countermeasures (ASVspoof) Challenge 2015. In this work we investigate different acoustic feature spaces to determine reliable and robust countermeasures against spoofing attacks. In addition to the commonly used front-end MFCC features we(More)
This paper presents a Speech Technology Center (STC) system submitted to the NIST i-vector Challenge. The system includes different subsystems based on PLDA, LDA-SVM, RBM-PLDA and DBN-PLDA. We propose an original iterative scheme for clustering the NIST i-vector Challenge devset. We also introduce the RBM-PLDA subsystem in the NIST i-vector Challenge.(More)
This paper presents an ITMO university system submitted to the Speakers in the Wild (SITW) Speaker Recognition Challenge. During evaluation track of the SITW challenge we explored conventional universal background model (UBM) Gaussian mixture model (GMM) i-vector systems and recently developed DNN-posteriors based i-vector systems. The systems were(More)
This paper presents the Speech Technology Center (STC) system submitted to NIST i-vector challenge. The system includes different subsystems based on TV-PLDA, TV-SVM, and RBM-PLDA. In this paper we focus on examining the third RBM-PLDA subsystem. Within this subsystem, we present our RBM extractor of the pseudo i-vector. Experiments performed on the test(More)
This paper presents a development of previous research by P.Kenny, which deals with using a supervised PLDA mixture of two gender-dependent speaker verification systems under the conditions of gender uncertainty. We propose using PLDA mixtures for speaker verification in different channels. However, in contrast to creating a gender-independent mixture, the(More)
The paper deals with the problem of estimation an optimal ivector based speaker voice model using several sessions of his or her voice recordings, each of which has different signal parameters: speech duration and SNR. Our aim is to minimize inter-session variability so as to achieve minimal EER in the task of speaker recognition. We examine the influence(More)