MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

@article{Wimbauer2021MonoRecSD,
  title={MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera},
  author={Felix Wimbauer and Nan Yang and Lukas von Stumberg and Niclas Zeller and Daniel Cremers},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={6108-6118}
}
  • Felix Wimbauer, Nan Yang, +2 authors D. Cremers
  • Published 24 November 2020
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
In this paper, we propose MonoRec, a semi-supervised monocular dense reconstruction architecture that predicts depth maps from a single moving camera in dynamic environments. MonoRec is based on a multi-view stereo setting which encodes the information of multiple consecutive images in a cost volume. To deal with dynamic objects in the scene, we introduce a MaskModule that predicts moving object masks by leveraging the photometric inconsistencies encoded in the cost volumes. Unlike other multi… 

Figures and Tables from this paper

TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo
TLDR
The experimental results show that TANDEM outperforms other state-of-the-art traditional and learning-based monocular visual odometry methods in terms of camera tracking and 3D reconstruction performance and proposes a novel tracking front-end that performs dense direct image alignment using depth maps rendered from a global model built incrementally from dense depth predictions.
The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth
TLDR
ManyDepth is proposed, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available, and takes inspiration from multi-view stereo, a deep end-to-end cost volume based approach that is trained using self-supervision only.
Attention meets Geometry: Geometry Guided Spatial-Temporal Attention for Consistent Self-Supervised Monocular Depth Estimation
TLDR
This paper explores how the increasingly popular transformer architecture, together with novel regularized loss formulations, can improve depth consistency while preserving accuracy and proposes a spatial attention module that correlates coarse depth predictions to aggregate local geometric information.

References

SHOWING 1-10 OF 76 REFERENCES
3D Packing for Self-Supervised Monocular Depth Estimation
TLDR
This work proposes a novel self-supervised monocular depth estimation method combining geometry with a new deep network, PackNet, learned only from unlabeled monocular videos, which outperforms other self, semi, and fully supervised methods on the KITTI benchmark.
Digging Into Self-Supervised Monocular Depth Estimation
TLDR
It is shown that a surprisingly simple model, and associated design choices, lead to superior predictions, and together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods.
Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras
TLDR
This work is the first to learn the camera intrinsic parameters, including lens distortion, from video in an unsupervised manner, thereby allowing us to extract accurate depth and motion from arbitrary videos of unknown origin at scale.
GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose
TLDR
An adaptive geometric consistency loss is proposed to increase robustness towards outliers and non-Lambertian regions, which resolves occlusions and texture ambiguities effectively and achieves state-of-the-art results in all of the three tasks, performing better than previously unsupervised methods and comparably with supervised ones.
The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth
TLDR
ManyDepth is proposed, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available, and takes inspiration from multi-view stereo, a deep end-to-end cost volume based approach that is trained using self-supervision only.
A Photometrically Calibrated Benchmark For Monocular Visual Odometry
TLDR
A novel, simple approach to non-parametric vignette calibration, which requires minimal set-up and is easy to reproduce and thoroughly evaluate two existing methods (ORB-SLAM and DSO) on the dataset.
MVSNet: Depth Inference for Unstructured Multi-view Stereo
TLDR
This work presents an end-to-end deep learning architecture for depth map inference from multi-view images that flexibly adapts arbitrary N-view inputs using a variance-based cost metric that maps multiple features into one cost feature.
Unsupervised Learning of Depth and Ego-Motion from Video
TLDR
Empirical evaluation demonstrates the effectiveness of the unsupervised learning framework for monocular depth performs comparably with supervised methods that use either ground-truth pose or depth for training, and pose estimation performs favorably compared to established SLAM systems under comparable input settings.
CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM
TLDR
A new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters is presented.
Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction
TLDR
The use of stereo sequences for learning depth and visual odometry enables the use of both spatial and temporal photometric warp error, and constrains the scene depth and camera motion to be in a common, real-world scale.
...
1
2
3
4
5
...