Learn More
We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 azimuths, 9 elevations, and 6 lighting conditions was collected (for a total of 194,400(More)
We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, followed by a feature-pooling layer that computes the max of each filter output within adjacent windows, and a point-wise sigmoid non-linearity. A(More)
The detection and recognition of generic object categories with invariance to viewpoint, illumination, and clutter requires the combination of a feature extractor and a classifier. We show that architectures such as convolutional networks are good at learning invariant features, but not always optimal for classification, while Support Vector Machines are(More)
As we articulate speech, we usually move the head and exhibit various facial expressions. This visual aspect of speech aids understanding and helps communicating additional information, such as the speaker’s mood. In this paper we analyze quantitatively head and facial movements that accompany speech and investigate how they relate to the text’s prosodic
In this paper, we describe a real-time face-tracking algorithm. We start with single face tracking based on statistical color modeling and the deformable template. We then expand the algorithm to track multiple faces, possibly with occlusion, by constraining the speed and size changes of the faces. We test the algorithm on sequences with different occlusion(More)
The machine learning and pattern recognition communities are facing two challenges: solving the normalization problem, and solving the deep learning problem. The normalization problem is related to the difficulty of training probabilistic models over large spaces while keeping them properly normalized. In recent years, the ML and natural language(More)