Learn More
approach to starting new projects: instead of debating at length about the best way to do something, engineers and executives alike prefer to get going immediately and then iterate and refine the approach. The other principle is that Google primarily seeks to solve really large problems. Capturing, processing, and serving street-level imagery at a global(More)
In this paper, we represent human actions as short sequences of atomic body poses. The knowledge of body pose is stored only implicitly as a set of silhouettes seen from multiple viewpoints; no explicit 3D poses or body models are used, and individual body parts are not identified. Actions and their constituent atomic poses are extracted from a set of(More)
By examining the problem of image correspondence (binocular stereo and optical flow) and its relationship with other modules such as segmentation, shape and depth estimation, occlusion detection, and local signal processing, we argue that early visual modules are entangled in chicken-and-egg relationships , and unraveling these necessitates a compositional(More)
We examine the implications of shape on the process of finding dense correspondence and half-occlusions for a stereo pair of images. The desired property of the disparity map is that it should be a piecewise continuous function which is consistent with the images and which has the minimum number of discontinuities. To zeroth order, piecewise continuity(More)
We examine the key role of occlusions in finding independently moving objects instantaneously in a video obtained by a moving camera with a restricted field of view. In this problem, the image motion is caused by the combined effect of camera motion (egomotion), structure (depth), and the independent motion of scene entities. For a camera with a restricted(More)
We examine the stereo correspondence problem in the presence of slanted scene surfaces. In particular, we highlight a previously overlooked geometric fact: a horizontally slanted surface (i.e. having depth variation in the direction of the separation of the two cameras) will appear horizontally stretched in one image as compared to the other image. Thus,(More)
The ability of a sensor network to parse out observable activities into a set of distinguishable actions is a powerful feature that can potentially enable many applications of sensor networks to everyday life situations. In this paper we introduce a framework that uses a hierarchy of Probabilistic Context Free Grammars (PCFGs) to perform such parsing. The(More)
We present a new real-time approach to object detection that exploits the efficiency of cascade classifiers with the accuracy of deep neural networks. Deep networks have been shown to excel at classification tasks, and their ability to operate on raw pixel input without the need to design special features is very appealing. However, deep nets are(More)