Monocular Depth Estimation via Listwise Ranking using the Plackett-Luce Model

  title={Monocular Depth Estimation via Listwise Ranking using the Plackett-Luce Model},
  author={Julian Lienen and Eyke Hullermeier},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Julian Lienen, E. Hullermeier
  • Published 25 October 2020
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
In many real-world applications, the relative depth of objects in an image is crucial for scene understanding. Recent approaches mainly tackle the problem of depth prediction in monocular images by treating the problem as a regression task. Yet, being interested in an order relation in the first place, ranking methods suggest themselves as a natural alternative to regression, and indeed, ranking approaches leveraging pairwise comparisons as training information ("object A is closer to the… 

Figures and Tables from this paper

Perception and Navigation in Autonomous Systems in the Era of Learning: A Survey.
The shortcomings of existing classical visual simultaneous localization and mapping (vSLAM) solutions are delineated, which demonstrate the necessity to integrate deep learning techniques, and the visual navigation based on learning systems are reviewed.
Single Image Depth Estimation: An Overview


Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer
This work proposes a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlights the importance of pretraining encoders on auxiliary tasks.
Learning Single-Image Depth From Videos Using Quality Assessment Networks
This paper proposes a method to automatically generate single-view depth training data through Structure-from-Motion on Internet videos through a Quality Assessment Network that identifies high-quality reconstructions obtained from SfM.
Structure-Guided Ranking Loss for Single Image Depth Prediction
This work proposes to use a simple pair-wise ranking loss with a novel sampling strategy to improve the quality of depth map prediction and introduces a new relative depth dataset of about 21K diverse high-resolution web stereo photos to enhance the generalization ability of the model.
Digging Into Self-Supervised Monocular Depth Estimation
It is shown that a surprisingly simple model, and associated design choices, lead to superior predictions, and together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods.
MegaDepth: Learning single- 14603 view depth prediction from internet photos
  • In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
  • 2018
Monocular Relative Depth Perception with Web Stereo Data Supervision
A simple yet effective method to automatically generate dense relative depth annotations from web stereo images, and an improved ranking loss is introduced to deal with imbalanced ordinal relations, enforcing the network to focus on a set of hard pairs.
Single-Image Depth Perception in the Wild
Experiments show that the proposed algorithm, combined with existing RGB-D data and the new relative depth annotations, significantly improves single-image depth perception in the wild.
Listwise approach to learning to rank: theory and algorithm
A sufficient condition on consistency for ranking is given, which seems to be the first such result obtained in related research, and analysis on three loss functions: likelihood loss, cosine loss, and cross entropy loss are conducted.
From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation
This paper proposes a network architecture that utilizes novel local planar guidance layers located at multiple stages in the decoding phase that outperforms the state-of-the-art works with significant margin evaluating on challenging benchmarks.
High Quality Monocular Depth Estimation via Transfer Learning
A convolutional neural network for computing a high-resolution depth map given a single RGB image with the help of transfer learning, which outperforms state-of-the-art on two datasets and also produces qualitatively better results that capture object boundaries more faithfully.