Let's Take This Online: Adapting Scene Coordinate Regression Network Predictions for Online RGB-D Camera Relocalisation

@article{Cavallari2019LetsTT,
  title={Let's Take This Online: Adapting Scene Coordinate Regression Network Predictions for Online RGB-D Camera Relocalisation},
  author={Tommaso Cavallari and Luca Bertinetto and Jishnu Mukhoti and Philip H. S. Torr and Stuart Golodetz},
  journal={2019 International Conference on 3D Vision (3DV)},
  year={2019},
  pages={564-573}
}
Many applications require a camera to be relocalised online, without expensive offline training on the target scene. [] Key Method Our approach replaces the appearance clustering performed by the branching structure of a regression forest with a two-step process that first uses the network to predict points in the original scene, and then uses these predicted points to look up clusters of points from the new scene. We show experimentally that our online approach achieves state-of-the-art performance on both…
Decoupling Features and Coordinates for Few-shot RGB Relocalization
TLDR
A decoupled solution where feature extraction, coordinate regression and pose estimation are performed separately is approach camera relocalization with a key insight that robust and discriminative image features used for coordinate regression should be learned by removing the distracting factor of camera views.
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
TLDR
PixLoc is introduced, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model, based on the direct alignment of multiscale deep features, casting camera localization as metric learning.
Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision
TLDR
The relative pose regression method matches the accuracy of absolute pose regression networks, while retaining the relative-pose models’ test-time speed and ability to generalize to non-training scenes.
Hierarchical Scene Coordinate Classification and Regression for Visual Localization
TLDR
This work presents a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image and presents a hybrid approach which outperforms existing scene coordinate regression methods, and reduces significantly the performance gap w.r.t. explicit feature matching methods.
Continual Learning for Image-Based Camera Localization
TLDR
This paper approaches the problem of visual localization in a continual learning setup – whereby the model is trained on scenes in an incremental manner and proposes a new sampling method based on coverage score (Buff-CS) that adapts the existing sampling strategies in the buffering process to the problemof visual localization.
Beyond Controlled Environments: 3D Camera Re-Localization in Changing Indoor Scenes
TLDR
This paper adapts 3RScan - a recently introduced indoor RGB-D dataset designed for object instance re-localization - to create RIO10, a new long-term camera re- localization benchmark focused on indoor scenes and explores how state-of-the-art cameraRe-localizers perform according to these metrics.
Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC
TLDR
A learning-based system that estimates the camera position and orientation from a single input image relative to a known environment using a deep neural network and fully differentiable pose optimization achieves state-of-the-art accuracy on various public datasets for RGB-based re-localization, and competitive accuracy forRGB-D based re- localization.
Reference Pose Generation for Long-term Visual Localization via Learned Features and View Synthesis
TLDR
This work proposes a semi-automated approach to generate reference poses based on feature matching between renderings of a 3D model and real images via learned features, and significantly improves the nighttime reference poses of the popular Aachen Day–Night dataset.
CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization
TLDR
This work proposes a new benchmark for visual localization in outdoor scenes, using crowd-sourced data to cover a wide range of geographical regions and camera devices with a focus on the failure cases of current algorithms.
Reference Pose Generation for Visual Localization via Learned Features and View Synthesis
TLDR
A semi-automated approach to generate reference poses based on feature matching between renderings of a 3D model and real images via learned features is proposed, showing that state-of-the-art visual localization methods perform better than predicted by the original reference poses.
...
...

References

SHOWING 1-10 OF 77 REFERENCES
On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation
TLDR
This paper shows how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly, and achieves relocalisation performance that is on par with that of offline forests, and the approach runs in under 150ms, making it desirable for real-time systems that require online Relocalisation.
Geometric Loss Functions for Camera Pose Regression with Deep Learning
  • Alex Kendall, R. Cipolla
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
TLDR
A number of novel loss functions for learning camera pose which are based on geometry and scene reprojection error are explored, and it is shown how to automatically learn an optimal weighting to simultaneously regress position and orientation.
Real-Time RGB-D Camera Pose Estimation in Novel Scenes Using a Relocalisation Cascade
TLDR
An extension of this work that achieves significantly better relocalisation performance whilst running fully in real time, and presents a novel way of visualising the internal behaviour of the forests, and uses the insights gleaned from this to show how to entirely circumvent the need to pre-train a forest on a generic scene.
Random forests versus Neural Networks — What's best for camera localization?
TLDR
The experimental findings show that for scene coordinate regression, traditional NN architectures are superior to test-time efficient RFs and ForestNets, however, this does not translate to final 6D camera pose accuracy where RFsAndForestNets perform slightly better.
PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization
TLDR
This work trains a convolutional neural network to regress the 6-DOF camera pose from a single RGB image in an end-to-end manner with no need of additional engineering or graph optimisation, demonstrating that convnets can be used to solve complicated out of image plane regression problems.
Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization
TLDR
This paper presents a new angle-based reprojection loss, which resolves the issues of the original reprojection lost and enables the system to utilize available multi-view constraints, which further improve performance.
Full-Frame Scene Coordinate Regression for Image-Based Localization
TLDR
This paper proposes to perform the scene coordinate regression in a full-frame manner to make the computation efficient at test time and to add more global context to the regression process to improve the robustness.
Modelling uncertainty in deep learning for camera relocalization
  • Alex Kendall, R. Cipolla
  • Computer Science
    2016 IEEE International Conference on Robotics and Automation (ICRA)
  • 2016
TLDR
A Bayesian convolutional neural network is used to regress the 6-DOF camera pose from a single RGB image and an estimate of the model's relocalization uncertainty is obtained to improve state of the art localization accuracy on a large scale outdoor dataset.
Learning Less is More - 6D Camera Localization via 3D Surface Regression
TLDR
This work addresses the task of predicting the 6D camera pose from a single RGB image in a given 3D environment by developing a fully convolutional neural network for densely regressing so-called scene coordinates, defining the correspondence between the input image and the 3D scene space.
Self-Supervised Visual Descriptor Learning for Dense Correspondence
TLDR
A new approach to learning visual descriptors for dense correspondence estimation is advocated in which the power of a strong three-dimensional generative model is harnessed to automatically label correspondences in RGB-D video data.
...
...