• Publications
  • Influence
Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling
A novel framework, namely 3D Generative Adversarial Network (3D-GAN), which generates 3D objects from a probabilistic space by leveraging recent advances in volumetric convolutional networks and generative adversarial nets, and a powerful 3D shape descriptor which has wide applications in 3D object recognition. Expand
Video Enhancement with Task-Oriented Flow
T task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner, is proposed, which outperforms traditional optical flow on standard benchmarks as well as the Vimeo-90K dataset in three video processing tasks. Expand
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
A novel model is designed that simultaneously performs 3D reconstruction and pose estimation; this multi-task learning approach achieves state-of-the-art performance on both tasks. Expand
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
This work proposes a neural-symbolic visual question answering system that first recovers a structural scene representation from the image and a program trace from the question, then executes the program on the scene representation to obtain an answer. Expand
The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision
We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our modelExpand
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
A novel approach that models future frames in a probabilistic manner is proposed, namely a Cross Convolutional Network to aid in synthesizing future frames; this network structure encodes image and motion information as feature maps and convolutional kernels, respectively. Expand
Deep multiple instance learning for image classification and auto-annotation
This paper attempts to model deep learning in a weakly supervised learning (multiple instance learning) framework, where each image follows a dual multi-instance assumption, where its object proposals and possible text annotations can be regarded as two instance sets. Expand
Single Image 3D Interpreter Network
This work proposes 3D INterpreter Network (3D-INN), an end-to-end framework which sequentially estimates 2D keypoint heatmaps and 3D object structure, trained on both real 2D-annotated images and synthetic 3D data, and achieves state-of-the-art performance on both 2DKeypoint estimation and3D structure recovery. Expand
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
This work introduces the CoLlision Events for Video REpresentation and Reasoning (CLEVRER), a diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks, and evaluates various state-of-the-art models for visual reasoning on a benchmark. Expand
MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation
This work forms the interactive segmentation problem as a multiple instance learning (MIL) task by generating positive bags from pixels of sweeping lines within a bounding box and develops an algorithm with significant performance and efficiency gain over existing state-of-the-art systems. Expand