Learn More
We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 azimuths, 9 elevations, and 6 lighting conditions was collected (for a total of 194,400(More)
We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, followed by a feature-pooling layer that computes the max of each filter output within adjacent windows, and a point-wise sigmoid non-linearity. A(More)
The detection and recognition of generic object categories with invariance to viewpoint, illumination, and clutter requires the combination of a feature extractor and a classifier. We show that architectures such as convolutional networks are good at learning invariant features, but not always optimal for classification, while Support Vector Machines are(More)
As we articulate speech, we usually move the head and exhibit various facial expressions. This visual aspect of speech aids understanding and helps communicating additional information, such as the speaker's mood. In this paper we analyze quantitatively head and facial movements that accompany speech and investigate how they relate to the text's prosodic(More)
In this paper, we describe a real-time face-tracking algorithm. We start with single face tracking based on statistical color modeling and the deformable template. We then expand the algorithm to track multiple faces, possibly with occlusion, by constraining the speed and size changes of the faces. We test the algorithm on sequences with different occlusion(More)
Concatenative visual speech synthesis selects frames from a large recorded video database of mouth shapes to generate photo-realistic talking head sequences. The synthesized sequence must exhibit precise lip-sound synchronization and smooth articulation. The selection process for finding the best lip shapes has been computationally expensive [1], limiting(More)