Sandro Cumani

Learn More
Recently, i-vector extraction and Probabilistic Linear Discriminant Analysis (PLDA) have proven to provide state-of-the-art speaker verification performance. In this paper, the speaker verification score for a pair of i-vectors representing a trial is computed with a functional form derived from the successful PLDA generative model. In our case, however,(More)
This work presents a new and efficient approach to discriminative speaker verification in the i-vector space. We illustrate the development of a linear discriminative classifier that is trained to discriminate between the hypothesis that a pair of feature vectors in a trial belong to the same speaker or to different speakers. This approach is alternative to(More)
Phonotactic models based on bags of n-grams representations and discriminative classifiers are a popular approach to the language recognition problem. However, the large size of n-gram count vectors brings about some difficulties in discriminative classifiers. The subspace Multinomial model was recently proposed to effectively represent information(More)
The i-vector extraction process is affected by several factors such as the noise level, the acoustic content of the observed features, and the duration of the analyzed speech segment. These factors influence both the i-vector estimate and its uncertainty, represented by the i-vector posterior covariance. This paper present a new PLDA model that, unlike the(More)
This work presents a new approach to discriminative speaker verification. Rather than estimating speaker models, or a model that discriminates between a speaker class and the class of all the other speakers, we directly solve the problem of classifying pairs of utterances as belonging to the same speaker or not.
Most state–of–the–art speaker recognition systems are based on Gaussian Mixture Models (GMMs), where a speech segment is represented by a compact representation, referred to as “identity vector” (ivector for short), extracted by means of Factor Analysis. The main advantage of this representation is that the problem of intersession variability is deferred to(More)
The i-vector extraction process is affected by several factors such as the noise level, the acoustic content of the observed features, the channel mismatch between the training conditions and the test data, and the duration of the analyzed speech segment. These factors influence both the i-vector estimate and its uncertainty, represented by the i-vector(More)
State-of-the-art systems for text-independent speaker recognition use as their features a compact representation of a speaker utterance, known as "i-vector." We recently presented an efficient approach for training a Pairwise Support Vector Machine (PSVM) with a suitable kernel for i-vector pairs for a quite large speaker recognition task. Rather than(More)
This paper proposes a simple model for speaker recognition based on i–vector pairs, and analyzes its similarity and differences with respect to the state–of–the–art Probabilistic Linear Discriminant Analysis (PLDA) and Pairwise Support Vector Machine (PSVM) models. Similar to the discriminative PSVM approach, we propose a generative model of i–vector pairs,(More)
In this work we assess the recently proposed hybrid Deep Neural Network/Gaussian Mixture Model (DNN/GMM) approach for speaker recognition considering the effects of the granularity of the phonetic DNN model, and of the precision of the corresponding GMM models, which will be referred to as the phonetic GMMs. The aim of this work is to better understand the(More)