End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization

@article{Chen2020EndtoEndLG,
  title={End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization},
  author={Bo Chen and {\'A}lvaro Parra and Jiewei Cao and Nan Li and Tat-Jun Chin},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={8097-8106}
}
Deep networks excel in learning patterns from large amounts of data. On the other hand, many geometric vision tasks are specified as optimization problems. To seamlessly combine deep learning and geometric vision, it is vital to perform learning and geometric optimization end-to-end. Towards this aim, we present BPnP, a novel network module that backpropagates gradients through a Perspective-n-Points (PnP) solver to guide parameter updates of a neural network. Based on implicit differentiation… Expand
On end-to-end 6DOF object pose estimation and robustness to object scale
  • 2021
This report contains a set of experiments that seek to reproduce the claims of two recent works related to keypoint 3 estimation, one specific to 6DOF object pose estimation, and the other presentingExpand
Detecting Object Surface Keypoints From a Single RGB Image via Deep Learning Network for 6-DoF Pose Estimation
TLDR
Techniques for defining 3D object surface keypoints and predicting their corresponding 2D counterparts via deep-learning network architectures are presented and Experimental results show that the proposed technique outperforms state-of-the-art approaches in both “2D projection” and “3D transformation” metrics. Expand
MonoRUn: Monocular 3D Object Detection by Self-Supervised Reconstruction and Uncertainty Propagation
Object localization in 3D space is a challenging aspect in monocular 3D object detection. Recent advances in 6DoF pose estimation have shown that predicting dense 2D-3D correspondence maps betweenExpand
MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation
TLDR
MonoRUn is a novel detection framework that learns dense correspondences and geometry in a self-supervised manner, with simple 3D bounding box annotations, and outperforms current state-of-the-art methods on KITTI benchmark. Expand
To The Point: Correspondence-driven monocular 3D category reconstruction
TLDR
To The Point (TTP), a method for reconstructing 3D objects from a single image using 2D to 3D correspondences learned from weak supervision, uses a simple per-sample optimization problem to replace CNN-based regression of camera pose and non-rigid deformation and thereby obtain substantially more accurate 3D reconstructions. Expand
RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering
TLDR
RePOSE leverages image rendering for fast feature extraction using a 3D model with a learnable texture and utilizes differentiable Levenberg-Marquardt (LM) optimization to refine a pose fast and accurately by minimizing the distance between the input and rendered image representations without the need of zooming in. Expand
SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation
TLDR
This work addresses the challenge of directly regressing all 6 degrees-of-freedom for the object pose in a cluttered environment from a single RGB image by means of a novel reasoning about self-occlusion, in order to establish a two-layer representation for 3D objects which considerably enhances the accuracy of end-to-end 6D pose estimation. Expand
EfficientPose: An efficient, accurate and scalable end-to-end 6D multi object pose estimation approach
TLDR
EfficientPose is a new approach for 6D object pose estimation that achieves a new state-of-the-art accuracy of 97.35% in terms of the ADD(-S) metric on the widely-used 6D pose estimation benchmark dataset Linemod using RGB input, while still running end-to-end at over 27 FPS. Expand
RePOSE: Real-Time Iterative Rendering and Refinement for 6D Object Pose Estimation
TLDR
The method, RePOSE, is a Real-time Iterative Rendering and Refinement algorithm for 6D POSE estimation that achieves state-of-the-art accuracy and can render an image representation in under 1ms. Expand
Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview
TLDR
This paper presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route, and compares results of current state-of-the-art methods on several publicly available datasets. Expand
...
1
2
...

References

SHOWING 1-10 OF 61 REFERENCES
End-to-End Learning of Geometry and Context for Deep Stereo Regression
We propose a novel deep learning architecture for regressing disparity from a rectified pair of stereo images. We leverage knowledge of the problem’s geometry to form a cost volume using deep featureExpand
DSAC — Differentiable RANSAC for Camera Localization
TLDR
DSAC is applied to the problem of camera localization, where deep learning has so far failed to improve on traditional approaches, and it is demonstrated that by directly minimizing the expected loss of the output camera poses, robustly estimated by RANSAC, it achieves an increase in accuracy. Expand
Geometric Loss Functions for Camera Pose Regression with Deep Learning
  • Alex Kendall, R. Cipolla
  • Mathematics, Computer Science
  • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
TLDR
A number of novel loss functions for learning camera pose which are based on geometry and scene reprojection error are explored, and it is shown how to automatically learn an optimal weighting to simultaneously regress position and orientation. Expand
Deeper Depth Prediction with Fully Convolutional Residual Networks
TLDR
A fully convolutional architecture, encompassing residual learning, to model the ambiguous mapping between monocular images and depth maps is proposed and a novel way to efficiently learn feature map up-sampling within the network is presented. Expand
PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization
TLDR
This work trains a convolutional neural network to regress the 6-DOF camera pose from a single RGB image in an end-to-end manner with no need of additional engineering or graph optimisation, demonstrating that convnets can be used to solve complicated out of image plane regression problems. Expand
OptNet: Differentiable Optimization as a Layer in Neural Networks
TLDR
OptNet is presented, a network architecture that integrates optimization problems (here, specifically in the form of quadratic programs) as individual layers in larger end-to-end trainable deep networks, and shows how techniques from sensitivity analysis, bilevel optimization, and implicit differentiation can be used to exactly differentiate through these layers. Expand
Modelling uncertainty in deep learning for camera relocalization
  • Alex Kendall, R. Cipolla
  • Computer Science
  • 2016 IEEE International Conference on Robotics and Automation (ICRA)
  • 2016
TLDR
A Bayesian convolutional neural network is used to regress the 6-DOF camera pose from a single RGB image and an estimate of the model's relocalization uncertainty is obtained to improve state of the art localization accuracy on a large scale outdoor dataset. Expand
gvnn: Neural Network Library for Geometric Computer Vision
TLDR
Gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning is introduced, and several new layers which are often used as parametric transformations on the data in geometricComputer vision are proposed. Expand
Numerical Coordinate Regression with Convolutional Neural Networks
TLDR
The differentiable spatial to numerical transform (DSNT) is proposed, which adds no trainable parameters, is fully differentiable, and exhibits good spatial generalization and offers a better trade-off between inference speed and prediction accuracy compared to existing techniques. Expand
DeMoN: Depth and Motion Network for Learning Monocular Stereo
TLDR
This work trains a convolutional network end-to-end to compute depth and camera motion from successive, unconstrained image pairs, and in contrast to the popular depth-from-single-image networks, DeMoN learns the concept of matching and better generalizes to structures not seen during training. Expand
...
1
2
3
4
5
...