Surface Normals in the Wild

  title={Surface Normals in the Wild},
  author={Weifeng Chen and Donglai Xiang and Jia Deng},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
We study the problem of single-image depth estimation for images in the wild. [] Key Method We propose two novel loss functions for training with surface normal annotations. Experiments on NYU Depth, KITTI, and our own dataset demonstrate that our approach can significantly improve the quality of depth estimation in the wild.
Monocular Relative Depth Perception with Web Stereo Data Supervision
A simple yet effective method to automatically generate dense relative depth annotations from web stereo images, and an improved ranking loss is introduced to deal with imbalanced ordinal relations, enforcing the network to focus on a set of hard pairs.
Learning Single-Image Depth From Videos Using Quality Assessment Networks
This paper proposes a method to automatically generate single-view depth training data through Structure-from-Motion on Internet videos through a Quality Assessment Network that identifies high-quality reconstructions obtained from SfM.
OASIS: A Large-Scale Dataset for Single Image 3D in the Wild
This work presents Open Annotations of Single Image Surfaces (OASIS), a dataset for single-image 3D in the wild consisting of annotations of detailed 3D geometry for 140,000 images, and expects OASIS to be a useful resource for 3D vision research.
Shape from Polarization for Complex Scenes in the Wild
This work proposes a learning-based framework with a multi-head self-attention module and viewing encoding, which is designed to handle increasing polarization ambiguities caused by complex materials and non-orthographic projection in scene-level SfP.
Deep Surface Normal Estimation With Hierarchical RGB-D Fusion
A hierarchical fusion network with adaptive feature re-weighting is proposed for surface normal estimation from a single RGB-D image, outperforming state-of-the-art normal estimation schemes.
360° Surface Regression with a Hyper-Sphere Loss
This work addresses the unavailability of sufficient 360° ground truth normal data, by leveraging existing 3D datasets and remodelling them via rendering and training a deep convolutional neural network on the task of monocular 360° surface estimation.
Counterfactual Depth from a Single RGB Image
We describe a method that predicts, from a single RGB image, a depth map that describes the scene when a masked object is removed - we call this "counterfactual depth" that models hidden scene
DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image
A deep learning architecture that produces accurate dense depth for the outdoor scene from a single color image and a sparse depth, which improves upon the state-of-the-art performance on KITTI depth completion benchmark.
GroundNet: Monocular Ground Plane Estimation with Geometric Consistency
This work focuses on the problem of estimating the 3D orientation of the ground plane from a single image (monocular vision) and proposes to add a consistency loss on top of the computed ground normals to leverage the geometric correlation between depth and normal.
Monocular Depth Estimation via Deep Structured Models with Ordinal Constraints
It is shown that a very limited number of user clicks could greatly boost monocular depth estimation performance and overcome monocular ambiguities and the inference of the proposed model could be efficiently solved through a feed-forward network.


Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture
  • D. Eigen, R. Fergus
  • Computer Science
    2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling. We use a multiscale convolutional
Single-Image Depth Perception in the Wild
Experiments show that the proposed algorithm, combined with existing RGB-D data and the new relative depth annotations, significantly improves single-image depth perception in the wild.
Discriminatively Trained Dense Surface Normal Estimation
This work proposes a method that combines contextual and segment-based cues and builds a regressor in a boosting framework by transforming the problem into the regression of coefficients of a local coding for dense surface normal estimation from a single image.
Designing deep networks for surface normal estimation
This paper proposes to build upon the decades of hard work in 3D scene understanding to design a new CNN architecture for the task of surface normal estimation and shows that incorporating several constraints and meaningful intermediate representations in the architecture leads to state of the art performance on surfacenormal estimation.
Deeper Depth Prediction with Fully Convolutional Residual Networks
A fully convolutional architecture, encompassing residual learning, to model the ambiguous mapping between monocular images and depth maps is proposed and a novel way to efficiently learn feature map up-sampling within the network is presented.
Unsupervised Monocular Depth Estimation with Left-Right Consistency
This paper proposes a novel training objective that enables the convolutional neural network to learn to perform single image depth estimation, despite the absence of ground truth depth data, and produces state of the art results for monocular depth estimation on the KITTI driving dataset.
Marr Revisited: 2D-3D Alignment via Surface Normal Prediction
A skip-network model built on the pre-trained Oxford VGG convolutional neural network (CNN) for surface normal prediction achieves state-of-the-art accuracy on the NYUv2 RGB-D dataset, and recovers fine object detail compared to previous methods.
Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs
This paper tackles this challenging and essentially underdetermined problem by regression on deep convolutional neural network (DCNN) features, combined with a post-processing refining step using conditional random fields (CRF).
Just Look at the Image: Viewpoint-Specific Surface Normal Prediction for Improved Multi-View Reconstruction
  • S. Galliani, K. Schindler
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
We present a multi-view reconstruction method that combines conventional multi-view stereo (MVS) with appearance-based normal prediction, to obtain dense and accurate 3D surface models. Reliable
Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
This work proposes a unsupervised framework to learn a deep convolutional neural network for single view depth prediction, without requiring a pre-training stage or annotated ground-truth depths, and shows that this network trained on less than half of the KITTI dataset gives comparable performance to that of the state-of-the-art supervised methods for singleView depth estimation.