Learn More
This article describes a general and powerful approach to modelling mismatch in speaker recognition by including an explicit session term in the Gaussian mixture speaker modelling framework. Under this approach, the Gaussian mixture model (GMM) that best represents the observations of a particular recording is the combination of the true speaker model with(More)
In this paper we present a trainable speech synthesis system that uses the trended Hidden Markov Model to generate the trajecto-ries of spectral features of synthesis units. The synthesis units are trained from a transcribed continuous speech corpus, making the speech more natural than that produced by conventional diphone synthesisers which are generally(More)
In this paper, we present an approach we refer to as " least squares congealing " which provides a solution to the problem of aligning an ensemble of images in an un-supervised manner. Our approach circumvents many of the limitations existing in the canonical " congealing " algorithm. Specifically, we present an algorithm that:-(i) is able to(More)
The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a(More)
(2011) Gait energy volumes and frontal gait recognition using depth images. c (c) 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or(More)
This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features.(More)
—Person re-identification involves recognising individuals in different locations across a network of cameras and is a challenging task due to a large number of varying factors such as pose (both subject and camera) and ambient lighting conditions. Existing databases do not adequately capture these variations, making evaluations of proposed techniques(More)
The addition of Three Dimensional (3D) data has the potential to greatly improve the accuracy of Face Recognition Technologies by providing complementary information. In this paper a new method combining intensity and range images and providing insensitivity to expression variation based on Log-Gabor Templates is presented. By breaking a single image into(More)
Automatically recognizing pain from video is a very useful application as it has the potential to alert carers to patients that are in discomfort who would otherwise not be able to communicate such emotion (i.e young children, patients in postoperative care etc.). In previous work [1], a " pain-no pain " system was developed which used an AAM-SVM approach(More)