• Corpus ID: 237213280

StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

  title={StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation},
  author={Boying Li and Yuan Huang and Zeyu Liu and Danping Zou and Wenxian Yu},
Self-supervised monocular depth estimation has achieved impressive performance on outdoor datasets. Its performance however degrades notably in indoor environments because of the lack of textures. Without rich textures, the photometric consistency is too weak to train a good depth network. Inspired by the early works on indoor modeling, we leverage the structural regularities exhibited in indoor scenes, to train a better depth network. Specifically, we adopt two extra supervisory signals for… 
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
A structure distillation approach to learn knacks from a pretrained depth estimator that produces structured but metricagnostic depth due to its in-the-wild mixed-dataset training is proposed, laying a solid basis for practical indoor depth estimation via self-supervision.
ColDE: A Depth Estimation Framework for Colonoscopy Reconstruction
A set of training losses to deal with the special challenges of colonoscopy data were designed, using both depth and surface normal information, and the classic photometric loss was extended with feature matching to compensate for illumination noise.


P2Net: Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation
This paper argues that the poor performance of the unsupervised depth estimation task in indoor environments suffers from the non-discriminative point-based matching, and proposes the P$^2Net, which outperforms existing approaches by a large margin.
Digging Into Self-Supervised Monocular Depth Estimation
It is shown that a surprisingly simple model, and associated design choices, lead to superior predictions, and together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods.
Unsupervised Depth Learning in Challenging Indoor Video: Weak Rectification to Rescue
This work establishes that the degenerate camera motions exhibited in handheld settings are a critical obstacle for unsupervised depth learning and proposes a novel data pre-processing method for effective training, i.e., search for image pairs with modest translation and remove their rotation via the proposed weak image rectification.
LEGO: Learning Edge with Geometry all at Once by Watching Videos
This paper introduces a "3D as-smooth-as-possible (3D-ASAP)" prior inside the pipeline, which enables joint estimation of edges and 3D scene, yielding results with significant improvement in accuracy for fine detailed structures.
Unsupervised Learning of Geometry From Videos With Edge-Aware Depth-Normal Consistency
The proposed surface normal representation for unsupervised depth estimation framework is constrained to be compatible with predicted normals, yielding more robust geometry results and showing that the algorithm vastly outperforms state-of-the-art datasets, which demonstrates the benefits of the approach.
Enforcing Geometric Constraints of Virtual Normal for Depth Prediction
This work shows the importance of the high-order 3D geometric constraints for depth prediction by designing a loss term that enforces one simple type of geometric constraints, namely, virtual normal directions determined by randomly sampled three points in the reconstructed 3D space, to considerably improve the depth prediction accuracy.
Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments
This work proposes a new optical-flow based training paradigm which reduces the difficulty of unsupervised learning by providing a clearer training target and handles the non-texture regions.
Self-Supervised Learning With Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera
We present GLNet, a self-supervised framework for learning depth, optical flow, camera pose and intrinsic parameters from monocular video -- addressing the difficulty of acquiring realistic
Deeper Depth Prediction with Fully Convolutional Residual Networks
A fully convolutional architecture, encompassing residual learning, to model the ambiguous mapping between monocular images and depth maps is proposed and a novel way to efficiently learn feature map up-sampling within the network is presented.
Feature-metric Loss for Self-supervised Learning of Depth and Egomotion
The proposed feature-metric loss is proposed and defined on feature representation, where the feature representation is also learned in a self-supervised manner and regularized by both first-order and second-order derivatives to constrain the loss landscapes to form proper convergence basins.