• Publications
  • Influence
Learning a Predictable and Generative Vector Representation for Objects
TLDR
We propose a novel architecture, called the TL-embedding network, to learn an embedding space with generative in 3D and predictable from 2D. Expand
  • 415
  • 51
  • PDF
Designing deep networks for surface normal estimation
TLDR
We use CNNs for the task of predicting surface normals from a single image. Expand
  • 249
  • 19
  • PDF
Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene
TLDR
The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in termsof shape and pose. Expand
  • 64
  • 9
  • PDF
From Lifestyle Vlogs to Everyday Interactions
TLDR
A major stumbling block to progress in understanding basic human interactions, such as getting out of bed or opening a refrigerator, is lack of good training data. Expand
  • 52
  • 9
  • PDF
Data-Driven 3D Primitives for Single Image Understanding
TLDR
We propose data-driven geometric primitives which are visually-discriminative and geometrically-informative, or conveying information about the 3D world when recognized. Expand
  • 141
  • 8
  • PDF
Representative elementary volume estimation for porosity, moisture saturation, and air‐water interfacial areas in unsaturated porous media: Data quality implications
[1] Achieving a representative elementary volume (REV) has become a de facto criterion for demonstrating the quality of CT measurements in porous media systems. However, the data quality implicationsExpand
  • 108
  • 6
  • PDF
Scene Semantics from Long-Term Observation of People
TLDR
We describe scene objects (sofas, tables, chairs) by associated human poses and object appearance. Expand
  • 103
  • 5
  • PDF
People Watching: Human Actions as a Cue for Single View Geometry
TLDR
We present an approach which exploits the coupling between human actions and scene geometry to use human pose as a cue for single-view 3D scene understanding. Expand
  • 137
  • 4
  • PDF
Unfolding an Indoor Origami World
TLDR
We propose the use of mid-level constraints for 3D scene understanding in the form of convex and concave edges and introduce a generic framework capable of incorporating these constraints. Expand
  • 73
  • 4
  • PDF
Cross-Task Weakly Supervised Learning From Instructional Videos
TLDR
We develop a component model for recognizing steps and a weakly supervised learning framework that can learn this model under temporal constraints from narration and the list of steps. Expand
  • 33
  • 4
  • PDF