• Publications
  • Influence
Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling
TLDR
A novel framework, namely 3D Generative Adversarial Network (3D-GAN), which generates 3D objects from a probabilistic space by leveraging recent advances in volumetric convolutional networks and generative adversarial nets, and a powerful 3D shape descriptor which has wide applications in 3D object recognition. Expand
Video Enhancement with Task-Oriented Flow
TLDR
T task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner, is proposed, which outperforms traditional optical flow on standard benchmarks as well as the Vimeo-90K dataset in three video processing tasks. Expand
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
TLDR
A novel model is designed that simultaneously performs 3D reconstruction and pose estimation; this multi-task learning approach achieves state-of-the-art performance on both tasks. Expand
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
TLDR
This work proposes a neural-symbolic visual question answering system that first recovers a structural scene representation from the image and a program trace from the question, then executes the program on the scene representation to obtain an answer. Expand
The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision
We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our modelExpand
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
TLDR
A novel approach that models future frames in a probabilistic manner is proposed, namely a Cross Convolutional Network to aid in synthesizing future frames; this network structure encodes image and motion information as feature maps and convolutional kernels, respectively. Expand
Deep multiple instance learning for image classification and auto-annotation
TLDR
This paper attempts to model deep learning in a weakly supervised learning (multiple instance learning) framework, where each image follows a dual multi-instance assumption, where its object proposals and possible text annotations can be regarded as two instance sets. Expand
Single Image 3D Interpreter Network
TLDR
This work proposes 3D INterpreter Network (3D-INN), an end-to-end framework which sequentially estimates 2D keypoint heatmaps and 3D object structure, trained on both real 2D-annotated images and synthetic 3D data, and achieves state-of-the-art performance on both 2DKeypoint estimation and3D structure recovery. Expand
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
TLDR
This work introduces the CoLlision Events for Video REpresentation and Reasoning (CLEVRER), a diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks, and evaluates various state-of-the-art models for visual reasoning on a benchmark. Expand
MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation
TLDR
This work forms the interactive segmentation problem as a multiple instance learning (MIL) task by generating positive bags from pixels of sweeping lines within a bounding box and develops an algorithm with significant performance and efficiency gain over existing state-of-the-art systems. Expand
...
1
2
3
4
5
...