Learn More
Speaker recognition systems trained on long duration utterances are known to perform significantly worse when short test segments are encountered. To address this mismatch, we analyze the effect of duration variability on phoneme distributions of speech utterances and i-vector length. We demonstrate that, as utterance duration is decreased , number of(More)
Inspired by the NIST SRE-2012 evaluation conditions we train the PLDA classifier in an i-vector speaker recognition system with different speaker populations, either including or excluding the target speakers in the evaluation. Including the target speakers in the PLDA training is always beneficial w.r.t. completely excluding them—which is the normal(More)
Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Mul-titaper methods form a spectrum estimate using multiple window functions and frequency-domain averaging. Multitapers provide a robust spectrum estimate but have not received much(More)
—Regularization of linear prediction based mel-frequency cepstral coefficient (MFCC) extraction in speaker verification is considered. Commonly, MFCCs are extracted from the discrete Fourier transform (DFT) spectrum of speech frames. In this paper, DFT spectrum estimate is replaced with the recently proposed regularized linear prediction (RLP) method.(More)
—This paper investigates the effect of utterance duration to the calibration of a modern i-vector speaker recognition system with probabilistic linear discriminant analysis (PLDA) modeling. A calibration approach to deal with these effects using quality measure functions (QMFs) is proposed to include duration in the calibration transformation. Extensive(More)
Different short-term spectrum estimators for speaker verification under additive noise are considered. Conventionally, mel-frequency cepstral coefficients (MFCCs) are computed from discrete Fourier transform (DFT) spectra of windowed speech frames. Recently, linear prediction (LP) and its temporally weighted variants have been substituted as the spectrum(More)
I4U is a joint entry of nine research Institutes and Universities across 4 continents to NIST SRE 2012. It started with a brief discussion during the Odyssey 2012 workshop in Singapore. An online discussion group was soon set up, providing a discussion platform for different issues surrounding NIST SRE'12. Noisy test segments, uneven multi-session training,(More)
—Text-independent speaker verification under additive noise corruption is considered. In the popular mel-frequency cepstral coefficient (MFCC) front-end, the conventional Fourier-based spectrum estimation is substituted with weighted linear predictive methods, which have earlier shown success in noise-robust speech recognition. Two temporally weighted(More)
In this paper, we propose a closed loop system to improve the performance of single-channel speech separation in a speaker independent scenario. The system is composed of two interconnected blocks: a separation block and a speaker identiſcation block. The improvement is accomplished by incorporating the speaker identities found by the speaker identiſcation(More)