Learn More
1. Abstract Most state–of–the–art speaker recognition systems are based on Gaussian Mixture Models (GMMs), where a speech segment is represented by a compact representation, referred to as " identity vector " (ivector for short), extracted by means of Factor Analysis. The main advantage of this representation is that the problem of intersession variability(More)
Recently, i-vector extraction and Probabilistic Linear Discriminant Analysis (PLDA) have proven to provide state-of-the-art speaker verification performance. In this paper, the speaker verification score for a pair of i-vectors representing a trial is computed with a functional form derived from the successful PLDA generative model. In our case, however,(More)
Phonotactic models based on bags of n-grams representations and discriminative classifiers are a popular approach to the language recognition problem. However, the large size of n-gram count vectors brings about some difficulties in discriminative classifiers. The subspace Multinomial model was recently proposed to effectively represent information(More)
This work presents a new approach to discriminative speaker verification. Rather than estimating speaker models, or a model that discriminates between a speaker class and the class of all the other speakers, we directly solve the problem of classifying pairs of utterances as belonging to the same speaker or not. The paper illustrates the development of a(More)
The i-vector extraction process is affected by several factors such as the noise level, the acoustic content of the observed features, and the duration of the analyzed speech segment. These factors influence both the i–vector estimate and its uncertainty, represented by the i– vector posterior covariance. This paper present a new PLDA model that, unlike the(More)
This paper contains a description of data, systems and fusions developed by the joint team of Brno University of Technology (BUT), Politecnico di Torino (PoliTo) and AGNITIO for the NIST 2011 Language Recognition Evaluation. The primary submission was a fusion of one acoustic and three phonotactic systems, with extensive use of sub-space projections for(More)
Several applications in Computer Vision, like recognition, identification, automatic 3D modeling and animation and non conventional human computer interaction require the precise identification of landmark points in facial images. Here we present a fast and robust algorithm capable of identifying a specific set of landmarks on face profile images. First,(More)