Corpus ID: 221535569

Self-Supervised Scale Recovery for Monocular Depth and Egomotion Estimation

  title={Self-Supervised Scale Recovery for Monocular Depth and Egomotion Estimation},
  author={Brandon Wagstaff and Jonathan Kelly},
The self-supervised loss formulation for jointly training depth and egomotion neural networks with monocular images is well studied and has demonstrated state-of-the-art accuracy. One of the main limitations of this approach, however, is that the depth and egomotion estimates are only determined up to an unknown scale. In this paper, we present a novel scale recovery loss that enforces consistency between a known camera height and the estimated camera height, generating metric (scaled) depth… Expand

Figures and Tables from this paper

Accurate and Robust Scale Recovery for Monocular Visual Odometry Based on Plane Geometry
A light-weight scale recovery framework leveraging an accurate and robust estimation of the ground plane and solving a least-squares problem using a RANSAC-based optimizer to achieve a highly accurate scale recovery. Expand
Data-driven Holistic Framework for Automated Laparoscope Optimal View Control with Learning-based Depth Perception
A novel rotation constraint using an affine map to minimize the visual warping problem, and a null-space controller is also embedded into the framework to optimize all types of errors in a unified and decoupled manner. Expand
Self-Supervised Structure-from-Motion through Tightly-Coupled Depth and Egomotion Networks
This work addresses the open problem of how to optimally couple the depth and egomotion network components and introduces several notions of coupling, categorize existing approaches, and presents a novel tightly-coupled approach that leverages the interdependence of breadth and depth at training and at inference time. Expand


Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video
This paper proposes a geometry consistency loss for scale-consistent predictions and an induced self-discovered mask for handling moving objects and occlusions and is the first work to show that deep networks trained using unlabelled monocular videos can predict globally scale- Consistent camera trajectories over a long video sequence. Expand
Unsupervised Learning of Depth and Ego-Motion from Video
Empirical evaluation demonstrates the effectiveness of the unsupervised learning framework for monocular depth performs comparably with supervised methods that use either ground-truth pose or depth for training, and pose estimation performs favorably compared to established SLAM systems under comparable input settings. Expand
Scale Recovery for Monocular Visual Odometry Using Depth Estimated with Deep Convolutional Neural Fields
This work proposes a novel method to recover the scale by incorporating the depths estimated from images using deep convolutional neural fields, which considers the whole environmental structure as reference rather than a specified plane. Expand
Unsupervised Monocular Depth Estimation with Left-Right Consistency
This paper proposes a novel training objective that enables the convolutional neural network to learn to perform single image depth estimation, despite the absence of ground truth depth data, and produces state of the art results for monocular depth estimation on the KITTI driving dataset. Expand
Multimodal scale estimation for monocular visual odometry
This work addresses the problem of monocular scale estimation by proposing a multimodal mechanism of prediction, classification, and correction and employs classifiers to detect scale outliers based on various features (e.g. moments on residuals). Expand
Masked GAN for Unsupervised Depth and Pose Prediction With Scale Consistency
A masked generative adversarial network (GAN) for unsupervised monocular depth and ego-motion estimations is proposed, designed to eliminate the effects of occlusions and impacts of visual field changes on the reconstruction loss and adversarial loss. Expand
Metrically-Scaled Monocular SLAM using Learned Scale Factors
  • W. N. Greene, N. Roy
  • Computer Science
  • 2020 IEEE International Conference on Robotics and Automation (ICRA)
  • 2020
We propose an efficient method for monocular simultaneous localization and mapping (SLAM) that is capable of estimating metrically-scaled motion without additional sensors or hardware acceleration byExpand
Self-Supervised Deep Pose Corrections for Robust Visual Odometry
Through extensive experiments, it is shown that the self-supervised DPC network can significantly enhance the performance of classical monocular and stereo odometry estimators and substantially out-performs state-of-the-art learning-only approaches. Expand
High Accuracy Monocular SFM and Scale Correction for Autonomous Driving
A novel data-driven mechanism for cue combination that allows highly accurate ground plane estimation by adapting observation covariances of multiple cues, such as sparse feature matching and dense inter-frame stereo, based on their relative confidences inferred from visual data on a per-frame basis is presented. Expand
Improving Learning-based Ego-motion Estimation with Homomorphism-based Losses and Drift Correction
A novel cost function for learning-based VO considering the mathematical properties of the group homomorphism is defined and it is proposed to reduce the VO drift by estimating the drivable regions using semantic segmentation and incorporate this information into a pose graph optimization. Expand