Video Enhancement with Task-Oriented Flow
- Tianfan Xue, Baian Chen, Jiajun Wu, D. Wei, W. Freeman
- Computer ScienceInternational Journal of Computer Vision
- 24 November 2017
T task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner, is proposed, which outperforms traditional optical flow on standard benchmarks as well as the Vimeo-90K dataset in three video processing tasks.
Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling
- Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, J. Tenenbaum
- Computer ScienceNIPS
- 24 October 2016
A novel framework, namely 3D Generative Adversarial Network (3D-GAN), which generates 3D objects from a probabilistic space by leveraging recent advances in volumetric convolutional networks and generative adversarial nets, and a powerful 3D shape descriptor which has wide applications in 3D object recognition.
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
- Xingyuan Sun, Jiajun Wu, W. Freeman
- Computer ScienceIEEE/CVF Conference on Computer Vision and…
- 12 April 2018
A novel model is designed that simultaneously performs 3D reconstruction and pose estimation; this multi-task learning approach achieves state-of-the-art performance on both tasks.
pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
- Eric Chan, M. Monteiro, Petr Kellnhofer, Jiajun Wu, Gordon Wetzstein
- Computer ScienceComputer Vision and Pattern Recognition
- 2 December 2020
We have witnessed rapid progress on 3D-aware image synthesis, leveraging recent advances in generative visual models and neural rendering. Existing approaches how-ever fall short in two ways: first,…
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
- Kexin Yi, Jiajun Wu, Chuang Gan, A. Torralba, Pushmeet Kohli, J. Tenenbaum
- Computer ScienceNeural Information Processing Systems
- 4 October 2018
This work proposes a neural-symbolic visual question answering system that first recovers a structural scene representation from the image and a program trace from the question, then executes the program on the scene representation to obtain an answer.
The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision
- Jiayuan Mao, Chuang Gan, Pushmeet Kohli, J. Tenenbaum, Jiajun Wu
- Computer ScienceInternational Conference on Learning…
- 26 April 2019
We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our model…
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
- Kexin Yi, Chuang Gan, J. Tenenbaum
- Computer ScienceInternational Conference on Learning…
- 3 October 2019
This work introduces the CoLlision Events for Video REpresentation and Reasoning (CLEVRER), a diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks, and evaluates various state-of-the-art models for visual reasoning on a benchmark.
Deep multiple instance learning for image classification and auto-annotation
- Jiajun Wu, Yinan Yu, Chang Huang, Kai Yu
- Computer ScienceComputer Vision and Pattern Recognition
- 7 June 2015
This paper attempts to model deep learning in a weakly supervised learning (multiple instance learning) framework, where each image follows a dual multi-instance assumption, where its object proposals and possible text annotations can be regarded as two instance sets.
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
- Tianfan Xue, Jiajun Wu, K. Bouman, Bill Freeman
- Computer ScienceNIPS
- 9 July 2016
A novel approach that models future frames in a probabilistic manner is proposed, namely a Cross Convolutional Network to aid in synthesizing future frames; this network structure encodes image and motion information as feature maps and convolutional kernels, respectively.
Single Image 3D Interpreter Network
- Jiajun Wu, Tianfan Xue, W. Freeman
- Computer ScienceEuropean Conference on Computer Vision
- 29 April 2016
This work proposes 3D INterpreter Network (3D-INN), an end-to-end framework which sequentially estimates 2D keypoint heatmaps and 3D object structure, trained on both real 2D-annotated images and synthetic 3D data, and achieves state-of-the-art performance on both 2DKeypoint estimation and3D structure recovery.
...
...