Corpus ID: 201668102

Improving Self-Supervised Single View Depth Estimation by Masking Occlusion

@article{Schellevis2019ImprovingSS,
  title={Improving Self-Supervised Single View Depth Estimation by Masking Occlusion},
  author={Maarten Schellevis},
  journal={ArXiv},
  year={2019},
  volume={abs/1908.11112}
}
Single view depth estimation models can be trained from video footage using a self-supervised end-to-end approach with view synthesis as the supervisory signal. This is achieved with a framework that predicts depth and camera motion, with a loss based on reconstructing a target video frame from temporally adjacent frames. In this context, occlusion relates to parts of a scene that can be observed in the target frame but not in a frame used for image reconstruction. Since the image… Expand

References

SHOWING 1-10 OF 13 REFERENCES
Digging Into Self-Supervised Monocular Depth Estimation
TLDR
It is shown that a surprisingly simple model, and associated design choices, lead to superior predictions, and together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Expand
Unsupervised Monocular Depth Estimation with Left-Right Consistency
TLDR
This paper proposes a novel training objective that enables the convolutional neural network to learn to perform single image depth estimation, despite the absence of ground truth depth data, and produces state of the art results for monocular depth estimation on the KITTI driving dataset. Expand
Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints
TLDR
The main contribution is to explicitly consider the inferred 3D geometry of the whole scene, and enforce consistency of the estimated 3D point clouds and ego-motion across consecutive frames, and outperforms the state-of-the-art for both breadth and depth. Expand
Unsupervised Learning of Depth and Ego-Motion from Video
TLDR
Empirical evaluation demonstrates the effectiveness of the unsupervised learning framework for monocular depth performs comparably with supervised methods that use either ground-truth pose or depth for training, and pose estimation performs favorably compared to established SLAM systems under comparable input settings. Expand
Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos
TLDR
This work addresses unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics. Expand
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
TLDR
This paper employs two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally, and applies a scale-invariant error to help measure depth relations rather than scale. Expand
Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
TLDR
This work proposes a unsupervised framework to learn a deep convolutional neural network for single view depth prediction, without requiring a pre-training stage or annotated ground-truth depths, and shows that this network trained on less than half of the KITTI dataset gives comparable performance to that of the state-of-the-art supervised methods for singleView depth estimation. Expand
Learning Depth from Monocular Videos Using Direct Methods
TLDR
It is argued that the depth CNN predictor can be learned without a pose CNN predictor and demonstrated empirically that incorporation of a differentiable implementation of DVO - along with a novel depth normalization strategy - substantially improves performance over state of the art that use monocular videos for training. Expand
Are we ready for autonomous driving? The KITTI vision benchmark suite
TLDR
The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Expand
Image quality assessment: from error visibility to structural similarity
TLDR
A structural similarity index is developed and its promise is demonstrated through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. Expand
...
1
2
...