• Publications
  • Influence
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
TLDR
This work proposes to amplify human effort through a partially automated labeling scheme, leveraging deep learning with humans in the loop, and constructs a new image dataset, LSUN, which contains around one million labeled images for each of 10 scene categories and 20 object categories. Expand
Matterport3D: Learning from RGB-D Data in Indoor Environments
TLDR
Matterport3D is introduced, a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400RGB-D images of 90 building-scale scenes that enable a variety of supervised and self-supervised computer vision tasks, including keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and region classification. Expand
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images
TLDR
An end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image by progressively deforming an ellipsoid, leveraging perceptual features extracted from the input image. Expand
Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks
TLDR
This work introduces a large-scale synthetic dataset with 500K physically-based rendered images from 45K realistic 3D indoor scenes and shows that pretraining with this new synthetic dataset can improve results beyond the current state of the art on all three computer vision tasks. Expand
DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image
TLDR
A deep learning architecture that produces accurate dense depth for the outdoor scene from a single color image and a sparse depth, which improves upon the state-of-the-art performance on KITTI depth completion benchmark. Expand
Deep Depth Completion of a Single RGB-D Image
TLDR
A deep network is trained that takes an RGB image as input and predicts dense surface normals and occlusion boundaries, then combined with raw depth observations provided by the RGB-D camera to solve for depths for all pixels, including those missing in the original observation. Expand
PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding
TLDR
Experiments show that solely based on 3D context without any image region category classifier, the proposed whole-room context model can achieve a comparable performance with the state-of-the-art object detector, demonstrating that when the FOV is large, context is as powerful as object appearance. Expand
DIST: Rendering Deep Implicit Signed Distance Function With Differentiable Sphere Tracing
TLDR
This work proposes a differentiable sphere tracing algorithm that can effectively reconstruct accurate 3D shapes from various inputs, such as sparse depth and multi-view images, through inverse optimization and shows excellent generalization capability and robustness against various noises. Expand
TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking
TLDR
This paper introduces a webcam-based gaze tracking system that supports large-scale, crowdsourced eye tracking deployed on Amazon Mechanical Turk (AMTurk), and builds a saliency dataset for a large number of natural images. Expand
ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems
TLDR
This paper presents ActiveStereoNet, the first deep learning solution for active stereo systems that is fully self-supervised, yet it produces precise depth with a subpixel precision; it does not suffer from the common over-smoothing issues; it preserves the edges; and it explicitly handles occlusions. Expand
...
1
2
3
4
5
...