Learn More
We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks(More)
We propose to shift the goal of recognition from naming to describing. Doing so allows us not only to name familiar objects, but also: to report unusual aspects of a familiar object (“spotty dog”, not just “dog”); to say something about unfamiliar objects (“hairy and four-legged”, not just “unknown”);(More)
We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation) and corresponding to particular image regions (region(More)
In this paper, we consider the problem of recovering the spatial layout of indoor scenes from monocular images. The presence of clutter is a major problem for existing single-view 3D reconstruction algorithms, most of which rely on finding the ground-wall boundary. In most rooms, this boundary is partially or entirely occluded. We gain robustness to clutter(More)
Humans can prepare concise descriptions of pictures, focus-ing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The(More)
Color constancy is the skill by which it is possible to tell the color of an object even under a colored light. I interpret the color of an object as its color under a fixed canonical light, rather than as a surface reflectance function. This leads to an analysis that shows two distinct sets of circumstances under which color constancy is possible. In this(More)
We show quite good face clustering is possible for a dataset of inaccurately and ambiguously labelled face images. Our dataset is 44,773 face images, obtained by applying a face finder to approximately half a million captioned news images. This dataset is more realistic than usual face recognition datasets, because it contains faces captured "in the wild"(More)