Jan Anguita

Learn More
Voice imitation is one of the potential threats to security systems that use automatic speaker recognition. Since prosodic features have been considered for state-of-the-art recognition systems in recent years, the question arises as to how vulnerable these features are to voice mimicking. In this study, two experiments are conducted for twelve individual(More)
In this paper, we address the modality integration issue on the example of a smart room environment aiming at enabling person identification by combining acoustic features and 2D face images. First we introduce the monomodal audio and video identification techniques and then we present the use of combined input speech and face images for person(More)
Prosody plays an important role in the human recognition process; therefore, prosodic elements are normally used by impersonators aiming to resemble someone else. Since such voice imitation is one of the potential threats to security systems relying on automatic speaker recognition, and prosodic features have been considered for state-of-the-art recognition(More)
SUMMARY Jacobian Adaptation (JA) has been successfully used in Automatic Speech Recognition (ASR) systems to adapt the acoustic models from the training to the testing noise conditions. In this work we present an improvement of JA for speaker verification, where a specific training noise reference is estimated for each speaker model. The new proposal, which(More)
Jacobian Adaptation (JA) of the acoustic models is a fast adaptation technique that has been successfully used in both speech and speaker recognition systems. This technique adapts the acoustic models on the basis of the difference between the testing and the training noise conditions. For this reason, a noise reference of both the training and the testing(More)
Error Rate de menos del 2%. Abstract: In this work we investigate new inter-phone and inter-word distances and we apply them to predict if two words of the lexicon of an Automatic Speech Recognition (ASR) system are likely to be confused. The inter-word distance is calculated from an alignment between the phonetic transcriptions of the words by adding the(More)
  • 1