Geometric Loss Functions for Camera Pose Regression with Deep Learning

@article{Kendall2017GeometricLF,
  title={Geometric Loss Functions for Camera Pose Regression with Deep Learning},
  author={Alex Kendall and Roberto Cipolla},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2017},
  pages={6555-6564}
}
  • Alex KendallR. Cipolla
  • Published 27 February 2017
  • Computer Science
  • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Deep learning has shown to be effective for robust and real-time monocular image relocalisation. In particular, PoseNet [22] is a deep convolutional neural network which learns to regress the 6-DOF camera pose from a single image. It learns to localize using high level features and is robust to difficult lighting, motion blur and unknown camera intrinsics, where point based SIFT registration fails. However, it was trained using a naive loss function, with hyper-parameters which require… 

Figures and Tables from this paper

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

PixLoc is introduced, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model, based on the direct alignment of multiscale deep features, casting camera localization as metric learning.

Understanding the Limitations of CNN-Based Absolute Camera Pose Regression

A theoretical model for camera pose regression is developed that is more closely related to pose approximation via image retrieval than to accurate pose estimation via 3D structure, and shows that additional research is needed before pose regression algorithms are ready to compete with structure-based methods.

Adversarial Networks for Camera Pose Regression and Refinement

This work introduces a novel framework based, in its core, on the idea of implicitly learning the joint distribution of RGB images and their corresponding camera poses using a discriminator network and adversarial learning that allows not only to regress the camera pose from a single image, however, also offers a solely RGB-based solution for camera pose refinement using the discriminators network.

Adversarial Joint Image and Pose Distribution Learning for Camera Pose Regression and Refinement

This work introduces a novel framework based, in its core, on the idea of modeling the joint distribution of RGB images and their corresponding camera poses using adversarial learning that allows not only to regress the camera pose from a single image, however, also offers a solely RGB-based solution for camera pose refinement using the discriminator network.

Learning Multi-Scene Absolute Pose Regression with Transformers

This work proposes to learn multi-scene absolute camera pose regression with Transformers, where encoders are used to aggregate activation maps with self-attention and decoders transform latent features and scenes encoding into candidate pose predictions.

FINet: Feature Interactions Across Dimensions and Hierarchies for Camera Localization

A feature hierarchy from the shallow and deep layers of deep convolutional networks to accumulate more local appearance features and can effectively be applied to improve the accuracy of camera localization even with only one image as input.

Introduction to Camera Pose Estimation with Deep Learning

This work describes key methods in the field and identifies trends aiming at improving the original deep pose regression solution and provides an extensive cross-comparison of existing learning-based pose estimators.

Boosting Image-Based Localization Via Randomly Geometric Data Augmentation

Geometric augmentation strategy can significantly improve the localization accuracy in visual localization by implementing randomly geometric augmentation (RGA) during training.

Relative Camera Pose Estimation using Synthetic Data with Domain Adaptation via Cycle-Consistent Adversarial Networks

This paper builds a Tuebingen Buildings dataset of RGB images collected by a drone in urban scenes and creates a 3D model for each scene to propose a relative camera pose estimation approach to solve the continuous localization problem for autonomous navigation of unmanned systems.

Adversarial Transfer of Pose Estimation Regression

A deep adaptation network for learning scene-invariant image representations and use adversarial learning to generate such representations for model transfer is developed and enriched with self-supervised learning and the adaptability theory is used to validate the existence of scene- Invariant representation of images in two given scenes.
...

References

SHOWING 1-10 OF 61 REFERENCES

PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization

This work trains a convolutional neural network to regress the 6-DOF camera pose from a single RGB image in an end-to-end manner with no need of additional engineering or graph optimisation, demonstrating that convnets can be used to solve complicated out of image plane regression problems.

Modelling uncertainty in deep learning for camera relocalization

  • Alex KendallR. Cipolla
  • Computer Science
    2016 IEEE International Conference on Robotics and Automation (ICRA)
  • 2016
A Bayesian convolutional neural network is used to regress the 6-DOF camera pose from a single RGB image and an estimate of the model's relocalization uncertainty is obtained to improve state of the art localization accuracy on a large scale outdoor dataset.

Image-Based Localization Using LSTMs for Structured Feature Correlation

Experimental results show the proposed CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes outperforms existing deep architectures, and can localize images in hard conditions, where classic SIFT-based methods fail.

VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization

A recurrent model for performing 6-DoF localization of video-clips is proposed and it is found that, even by considering only short sequences, the pose estimates are smoothed and the localization error can be drastically reduced.

Image-based Localization with Spatial LSTMs

Experimental results show the proposed CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes outperforms existing deep architectures, and can localize images in hard conditions, e.g., in the presence of mostly textureless surfaces.

Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images

We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring

VidLoc: 6-DoF Video-Clip Relocalization

A recurrent model for performing 6-DoF localization of video-clips is proposed and it is found that, even by considering only short sequences, the pose estimates are smoothed and the localization error can be drastically reduced.

Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

A principled approach to multi-task deep learning is proposed which weighs multiple loss functions by considering the homoscedastic uncertainty of each task, allowing us to simultaneously learn various quantities with different units or scales in both classification and regression settings.

Indoor Relocalization in Challenging Environments With Dual-Stream Convolutional Neural Networks

Deep learning is introduced into the indoor relocalization problem and a dual-stream CNN (depth stream and color stream) is used to realize 6-DOF pose regression in an end-to-end manner to solve the indoor Relocalization problems based on deep CNNs with RGB-D camera.

Relative Camera Pose Estimation Using Convolutional Neural Networks

This paper presents a convolutional neural network based approach for estimating the relative pose between two cameras. The proposed network takes RGB images from both cameras as input and directly
...