• Publications
  • Influence
Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling
tl;dr
We propose a novel framework, namely 3D Generative Adversarial Network (3D-GAN), which generates 3D objects from a probabilistic space by leveraging recent advances in volumetric convolutional networks and generative adversarial nets. Expand
  • 877
  • 104
  • Open Access
Video Enhancement with Task-Oriented Flow
tl;dr
We propose to learn task-oriented flow (TOFlow) by performing motion analysis and video processing jointly in an end-to-end trainable convolutional network. Expand
  • 127
  • 45
  • Open Access
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
tl;dr
We present Pix3D, a large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Expand
  • 116
  • 36
  • Open Access
Single Image 3D Interpreter Network
tl;dr
We propose 3D INterpreter Network (3D-INN), an end-to-end framework for recovering 3D object skeletons, trained on both real 2D-labeled images and synthetic 3D objects. Expand
  • 222
  • 27
  • Open Access
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
tl;dr
We study the problem of synthesizing a number of likely future frames from a single input image. Expand
  • 316
  • 22
  • Open Access
Deep multiple instance learning for image classification and auto-annotation
tl;dr
In this paper, we attempt to model deep learning in a weakly supervised learning framework and apply the learned visual knowledge to assist the task of image classification. Expand
  • 281
  • 21
  • Open Access
MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation
tl;dr
We formulate the interactive segmentation problem as a multiple instance learning (MIL) task by generating positive bags from pixels of sweeping lines within a bounding box. Expand
  • 80
  • 15
  • Open Access
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
tl;dr
We marry two powerful ideas: deep representation learning for visual recognition and language understanding, and symbolic program execution for reasoning, proposing a neural-symbolic approach for visual question answering. Expand
  • 116
  • 13
  • Open Access
MarrNet: 3D Shape Reconstruction via 2.5D Sketches
tl;dr
We propose MarrNet, an end-to-end trainable model that sequentially estimates 2.5D sketches and 3D object shape. Expand
  • 144
  • 12
  • Open Access
Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning
tl;dr
We propose a generative model for solving these problems of physical scene understanding from real-world videos and images. Expand
  • 212
  • 11
  • Open Access