Learn More
We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have(More)
Given a large dataset of images, we seek to automatically determine the visually similar object and scene classes together with their image segmentation. To achieve this we combine two ideas: (i) that a set of segmented objects can be partitioned into visual object classes using topic discovery models from statistical text analysis; and (ii) that visual(More)
m a ss a c h u se t t s i n st i t u t e o f t e c h n o l o g y, c a m b ri d g e , m a 02139 u s a — w w w. c s a il. mi t. Abstract Given a set of images containing multiple object categories, we seek to discover those categories and their image locations without supervision. We achieve this using generative models from the statistical text literature:(More)
Objects in the world can be arranged into a hierarchy based on their semantic meaning (e.g. organism - animal - feline - cat). What about defining a hierarchy based on the visual appearance of objects? This paper investigates ways to automatically discover a hierarchical structure for the visual world from a collection of unlabeled images. Previous(More)
When a band-pass filter is applied to a natural image, the distribution of the output has a consistent , distinctive form across many different images, with the distribution sharply peaked at zero and relatively heavy-tailed. This prior has been exploited for several image processing tasks. We show how this prior on the appearance of natural images can also(More)
Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image , in an appropriate representation, to(More)
We introduce an approach that leverages surface normal predictions, along with appearance cues, to retrieve 3D models for objects depicted in 2D still images from a large CAD object library. Critical to the success of our approach is the ability to recover accurate surface normals for objects in the depicted scene. We introduce a skip-network model built on(More)
This paper poses object category detection in images as a type of 2D-to-3D alignment problem, utilizing the large quantities of 3D CAD models that have been made publicly available online. Using the "chair" class as a running example, we propose an exemplar-based 3D category representation, which can explicitly model chairs of different styles as well as(More)
We introduce an approach for analyzing the variation of features generated by convolutional neural networks (CNNs) trained on large image datasets with respect to scene factors that occur in natural images. Such factors may include object style, 3D viewpoint, color, and scene lighting configuration. Our approach analyzes CNN feature responses with respect(More)