Learn More
We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 azimuths, 9 elevations, and 6 lighting conditions was collected (for a total of 194,400(More)
We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, followed by a point-wise sigmoid non-linearity, and a feature-pooling layer that computes the max of each filter output within adjacent windows. A(More)
The detection and recognition of generic object categories with invariance to viewpoint, illumination, and clutter requires the combination of a feature extractor and a classifier. We show that architectures such as convolutional networks are good at learning invariant features, but not always optimal for classification, while Support Vector Machines are(More)
As we articulate speech, we usually move the head and exhibit various facial expressions. This visual aspect of speech aids understanding and helps communicating additional information, such as the speaker's mood. In this paper we analyze quantitatively head and facial movements that accompany speech and investigate how they relate to the text's prosodic(More)
* In this paper, we describe a real-time face-tracking algorithm. We start from the single face tracking based on the stochastic color model and the deformable template. Then we expand the algorithm to multiple face tracking based on constraints on the speed and size of the faces. We test the algorithm on sequences with different occlusion patterns, and(More)