Ran D. Zilca

Learn More
This paper presents an overview of the architecture and algorithms implemented in IBM's text-independent speaker veriication system developed for the 2002 NIST Speaker Recognition Evaluation, particularly for the 1-speaker detection task using cellular test data. We describe individual components including a Gaussianization front-end, celluar-codec(More)
Conversational Biometrics combines acoustic speaker verification with conversational knowledge verification to make a more accurate identity decision. To manage the added level of complexity that the multi-modal user recognition approach introduces, this paper proposes the use of verification policies, in the form of Finite State Machines, which can be used(More)
—This paper describes a computationally simple method to perform text independent speaker verification using second order statistics. The suggested method, called utterance level scoring (ULS), allows obtaining a normalized score using a single pass through the frames of the tested utterance. The utterance sample covariance is first calculated and then(More)
Pitch mismatch between enrollment and testing is a common problem in speaker recognition systems. It is well known that the fine spectral structure related to fundamental frequency manifests itself in Mel cepstral features used for speaker recognition. Therefore pitch variations result in variation of the acoustic features, and potentially an increase in(More)
Speaker recognition systems employ a speech detection algorithm and use only frames detected as speech for further processing. The accuracy obtained by a speaker recognition system depends on the method that is used to detect speech, in particular for real-life deployments where the incoming speech varies significantly in loudness and noise characteristics.(More)
The paper considers text independent speaker identification over the telephone using short training and testing data. Gaussian Mixture Modeling (GMM) is used in the testing phase, but the parameters of the model are taken from clusters obtained for the training data by an adequate choice of feature vectors and a distance measure without optimization in the(More)