Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency

  title={Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency},
  author={Tianwei Shen and Lei Zhou and Zixin Luo and Yao Yao and Shiwei Li and Jiahui Zhang and Tian Fang and Long Quan},
  journal={2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)},
  • Tianwei Shen, Lei Zhou, +5 authors Long Quan
  • Published 19 September 2019
  • Computer Science
  • 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
The self-supervised learning of depth and pose from monocular sequences provides an attractive solution by using the photometric consistency of nearby frames as it depends much less on the ground-truth data. In this paper, we address the issue when previous assumptions of the self-supervised approaches are violated due to the dynamic nature of real-world scenes. Different from handling the noise as uncertainty, our key idea is to incorporate more robust geometric quantities and enforce internal… Expand
Self-Supervised Learning of Depth and Ego-Motion from Video by Alternative Training and Geometric Constraints from 3D to 2D
This paper aims to improve the depth-pose learning performance without the auxiliary tasks and address the above issues by alternative training each task and incorporating the epipolar geometric constraints into the Iterative Closest Point (ICP) based point clouds match process. Expand
MVP: Unified Motion and Visual Self-Supervised Learning for Large-Scale Robotic Navigation
This paper proposes a novel motion and visual perception approach, dubbed MVP, that unifies these two sensor modalities for large-scale, target-driven navigation tasks and can learn faster, and is more accurate and robust to both extreme environmental changes and poor GPS data than corresponding vision-only navigation methods. Expand
Deep Matching Prior: Test-Time Optimization for Dense Correspondence
It is shown that an image pair-specific prior can be captured by solely optimizing the untrained matching networks on an input pair of images, and this framework, dubbed Deep Matching Prior (DMP), is competitive, or even outperforms, against the latest learning-based methods on several benchmarks, even though it requires neither large training data nor intensive learning. Expand
M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network
A novel unsupervised multi-metric MVS network, named M^3VSNet, is proposed, for dense point cloud reconstruction without any supervision, that combines pixel-wise and feature-wise loss function and incorporates the normal-depth consistency in the 3D point cloud format to improve the accuracy and continuity of the estimated depth maps. Expand


Beyond Photometric Loss for Self-Supervised Ego-Motion Estimation
This paper bridges the gap between geometric loss and photometric loss by introducing the matching loss constrained by epipolar geometry in a self-supervised framework and outperforms the state-of-the-art unsupervised egomotion estimation methods by a large margin. Expand
GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose
An adaptive geometric consistency loss is proposed to increase robustness towards outliers and non-Lambertian regions, which resolves occlusions and texture ambiguities effectively and achieves state-of-the-art results in all of the three tasks, performing better than previously unsupervised methods and comparably with supervised ones. Expand
Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints
The main contribution is to explicitly consider the inferred 3D geometry of the whole scene, and enforce consistency of the estimated 3D point clouds and ego-motion across consecutive frames, and outperforms the state-of-the-art for both breadth and depth. Expand
Unsupervised Learning of Depth and Ego-Motion from Video
Empirical evaluation demonstrates the effectiveness of the unsupervised learning framework for monocular depth performs comparably with supervised methods that use either ground-truth pose or depth for training, and pose estimation performs favorably compared to established SLAM systems under comparable input settings. Expand
Learning Depth from Monocular Videos Using Direct Methods
It is argued that the depth CNN predictor can be learned without a pose CNN predictor and demonstrated empirically that incorporation of a differentiable implementation of DVO - along with a novel depth normalization strategy - substantially improves performance over state of the art that use monocular videos for training. Expand
Learning Monocular Depth Estimation with Unsupervised Trinocular Assumptions
This paper introduces a novel interleaved training procedure enabling to enforce the trinocular assumption outlined from current binocular datasets, and outperforms state-of-the-art methods for unsupervised monocular depth estimation trained on binocular stereo pairs as well as any known methods relying on other cues. Expand
Learning Depth from Single Monocular Images
This work begins by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps, and applies supervised learning to predict the depthmap as a function of the image. Expand
Unsupervised Monocular Depth Estimation with Left-Right Consistency
This paper proposes a novel training objective that enables the convolutional neural network to learn to perform single image depth estimation, despite the absence of ground truth depth data, and produces state of the art results for monocular depth estimation on the KITTI driving dataset. Expand
DeMoN: Depth and Motion Network for Learning Monocular Stereo
This work trains a convolutional network end-to-end to compute depth and camera motion from successive, unconstrained image pairs, and in contrast to the popular depth-from-single-image networks, DeMoN learns the concept of matching and better generalizes to structures not seen during training. Expand
Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction
The use of stereo sequences for learning depth and visual odometry enables the use of both spatial and temporal photometric warp error, and constrains the scene depth and camera motion to be in a common, real-world scale. Expand