Occlusion-Aware Depth Estimation with Adaptive Normal Constraints

@article{Long2020OcclusionAwareDE,
  title={Occlusion-Aware Depth Estimation with Adaptive Normal Constraints},
  author={Xiaoxiao Long and Lingjie Liu and Christian Theobalt and Wenping Wang},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.00845}
}
We present a new learning-based method for multi-frame depth estimation from a color video, which is a fundamental problem in scene understanding, robot navigation or handheld 3D reconstruction. While recent learning-based methods estimate depth at high accuracy, 3D point clouds exported from their depth maps often fail to preserve important geometric feature (e.g., corners, edges, planes) of man-made scenes. Widely-used pixel-wise depth errors do not specifically penalize inconsistency on… 
Deep Multi-view Depth Estimation with Predicted Uncertainty
TLDR
This paper employs a dense-optical-flow network to compute correspondences and then triangulate the point cloud to obtain an initial depth map, and introduces a depth-refinement network (DRN) that optimizes the initialdepth map based on the image’s contextual cues.
Adaptive Surface Normal Constraint for Depth Estimation
TLDR
This work introduces a simple yet effective method, named Adaptive Surface Normal (ASN) constraint, to effectively correlate the depth estimation with geometric consistency, to adaptively determine the reliable local geometry from a set of randomly sampled candidates to derive surface normal constraint.
Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks
TLDR
A novel method for multi-view depth estimation from a single video, which is a critical task in various applications, such as perception, reconstruction and robot navigation, and achieves higher accuracy in depth estimation and significant speedup than the SOTA methods.
The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth
TLDR
ManyDepth is proposed, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available, and takes inspiration from multi-view stereo, a deep end-to-end cost volume based approach that is trained using self-supervision only.
Space-time Neural Irradiance Fields for Free-Viewpoint Video
TLDR
A method that learns a spatiotemporal neural irradiance field for dynamic scenes from a single video using the scene depth estimated from video depth estimation methods, aggregating contents from individual frames into a single global representation.
NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
TLDR
To the best of the knowledge, this is the first learning-based system that is able to reconstruct dense coherent 3D geometry in real-time and outperforms state-of-the-art methods in terms of both ac-curacy and speed.
Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry
TLDR
Qualitative evaluation demon-strates that the proposed MaGNet method is more robust against challenging artifacts such as texture-less/reflective surfaces and moving objects, and achieves state-of-the-art performance on ScanNet, 7-Scenes and KITTI.
A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo
TLDR
A novel solver that iteratively solves for per-view depth map and normal map by optimizing an energy potential based on the locally planar assumption is introduced, which can be trained end-to-end and consistently improves the depth quality over both conventional and deep learning based MVS pipelines.
SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse views
TLDR
This work introduces SparseNeuS, a novel neural rendering based method for the task of surface reconstruction from multi-view images that not only outperforms the state-of-the-art methods, but also exhibits good efficiency, generalizability, and flexibility.
Neural 3D Scene Reconstruction with the Manhattan-world Assumption
TLDR
This work shows that the planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods, and proposes a novel loss that jointly optimizes the scene geometry and semantics in 3D space.
...
...

References

SHOWING 1-10 OF 50 REFERENCES
Enforcing Geometric Constraints of Virtual Normal for Depth Prediction
TLDR
This work shows the importance of the high-order 3D geometric constraints for depth prediction by designing a loss term that enforces one simple type of geometric constraints, namely, virtual normal directions determined by randomly sampled three points in the reconstructed 3D space, to considerably improve the depth prediction accuracy.
Normal Assisted Stereo Depth Estimation
TLDR
A novel consistency loss to train an independent consistency module that refines the depths from depth/normal pairs and it is found that the joint learning can improve both the prediction of normal and depth, and the accuracy and smoothness can be further improved by enforcing the consistency.
Deep Ordinal Regression Network for Monocular Depth Estimation
TLDR
The proposed deep ordinal regression network (DORN) achieves state-of-the-art results on three challenging benchmarks, i.e., KITTI, Make3D, and NYU Depth v2, and outperforms existing methods by a large margin.
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
TLDR
This paper employs two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally, and applies a scale-invariant error to help measure depth relations rather than scale.
Deep Depth Completion of a Single RGB-D Image
TLDR
A deep network is trained that takes an RGB image as input and predicts dense surface normals and occlusion boundaries, then combined with raw depth observations provided by the RGB-D camera to solve for depths for all pixels, including those missing in the original observation.
Neural RGB®D Sensing: Depth and Uncertainty From a Video Camera
TLDR
This paper proposes a deep learning method to estimate per-pixel depth and its uncertainty continuously from a monocular video stream, with the goal of effectively turning an RGB camera into an RGB-D camera.
MVDepthNet: Real-Time Multiview Depth Estimation Neural Network
TLDR
MVDepthNet is presented, a convolutional network to solve the depth estimation problem given several image-pose pairs from a localized monocular camera in neighbor viewpoints, and it is shown that this method can generate depth maps efficiently and precisely.
Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring
Deep convolutional neural fields for depth estimation from a single image
TLDR
A deep structured learning scheme which learns the unary and pairwise potentials of continuous CRF in a unified deep CNN framework and can be used for depth estimations of general scenes with no geometric priors nor any extra information injected.
3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions
TLDR
3DMatch is presented, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data that consistently outperforms other state-of-the-art approaches by a significant margin.
...
...