Surface Normals in the Wild

  title={Surface Normals in the Wild},
  author={Weifeng Chen and Donglai Xiang and Jia Deng},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
We study the problem of single-image depth estimation for images in the wild. [] Key Method We propose two novel loss functions for training with surface normal annotations. Experiments on NYU Depth, KITTI, and our own dataset demonstrate that our approach can significantly improve the quality of depth estimation in the wild.
Monocular Relative Depth Perception with Web Stereo Data Supervision
A simple yet effective method to automatically generate dense relative depth annotations from web stereo images, and an improved ranking loss is introduced to deal with imbalanced ordinal relations, enforcing the network to focus on a set of hard pairs.
Learning Single-Image Depth From Videos Using Quality Assessment Networks
This paper proposes a method to automatically generate single-view depth training data through Structure-from-Motion on Internet videos through a Quality Assessment Network that identifies high-quality reconstructions obtained from SfM.
OASIS: A Large-Scale Dataset for Single Image 3D in the Wild
This work presents Open Annotations of Single Image Surfaces (OASIS), a dataset for single-image 3D in the wild consisting of annotations of detailed 3D geometry for 140,000 images, and expects OASIS to be a useful resource for 3D vision research.
Shape from Polarization for Complex Scenes in the Wild
This work proposes a learning-based framework with a multi-head self-attention module and viewing encoding, which is designed to handle increasing polarization ambiguities caused by complex materials and non-orthographic projection in scene-level SfP.
Deep Surface Normal Estimation With Hierarchical RGB-D Fusion
A hierarchical fusion network with adaptive feature re-weighting is proposed for surface normal estimation from a single RGB-D image, outperforming state-of-the-art normal estimation schemes.
360° Surface Regression with a Hyper-Sphere Loss
This work addresses the unavailability of sufficient 360° ground truth normal data, by leveraging existing 3D datasets and remodelling them via rendering and training a deep convolutional neural network on the task of monocular 360° surface estimation.
Counterfactual Depth from a Single RGB Image
We describe a method that predicts, from a single RGB image, a depth map that describes the scene when a masked object is removed - we call this "counterfactual depth" that models hidden scene
DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image
A deep learning architecture that produces accurate dense depth for the outdoor scene from a single color image and a sparse depth, which improves upon the state-of-the-art performance on KITTI depth completion benchmark.
Monocular Depth Estimation via Deep Structured Models with Ordinal Constraints
It is shown that a very limited number of user clicks could greatly boost monocular depth estimation performance and overcome monocular ambiguities and the inference of the proposed model could be efficiently solved through a feed-forward network.
GroundNet: Monocular Ground Plane Normal Estimation with Geometric Consistency
This model achieves the top-ranked performance on ground plane normal estimation and horizon line detection on the real-world outdoor datasets of ApolloScape and KITTI, improving the performance of previous art by up to 17.7% relatively.


Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture
  • D. Eigen, R. Fergus
  • Computer Science
    2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling. We use a multiscale convolutional
Single-Image Depth Perception in the Wild
Experiments show that the proposed algorithm, combined with existing RGB-D data and the new relative depth annotations, significantly improves single-image depth perception in the wild.
Discriminatively Trained Dense Surface Normal Estimation
This work proposes a method that combines contextual and segment-based cues and builds a regressor in a boosting framework by transforming the problem into the regression of coefficients of a local coding for dense surface normal estimation from a single image.
Unsupervised Monocular Depth Estimation with Left-Right Consistency
This paper proposes a novel training objective that enables the convolutional neural network to learn to perform single image depth estimation, despite the absence of ground truth depth data, and produces state of the art results for monocular depth estimation on the KITTI driving dataset.
Marr Revisited: 2D-3D Alignment via Surface Normal Prediction
A skip-network model built on the pre-trained Oxford VGG convolutional neural network (CNN) for surface normal prediction achieves state-of-the-art accuracy on the NYUv2 RGB-D dataset, and recovers fine object detail compared to previous methods.
Just Look at the Image: Viewpoint-Specific Surface Normal Prediction for Improved Multi-View Reconstruction
  • S. Galliani, K. Schindler
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
We present a multi-view reconstruction method that combines conventional multi-view stereo (MVS) with appearance-based normal prediction, to obtain dense and accurate 3D surface models. Reliable
SURGE: Surface Regularized Geometry Estimation from a Single Image
An approach to regularize 2.5D surface normal and depth predictions at each pixel given a single input image and proposes new planar-wise metrics to evaluate geometry consistency within planar surfaces, which are more tightly related to dependent 3D editing applications.
3-D Depth Reconstruction from a Single Still Image
This work proposes a model that incorporates both monocular cues and stereo (triangulation) cues, to obtain significantly more accurate depth estimates than is possible using either monocular or stereo cues alone.
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
This paper employs two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally, and applies a scale-invariant error to help measure depth relations rather than scale.
ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.