Let's Take This Online: Adapting Scene Coordinate Regression Network Predictions for Online RGB-D Camera Relocalisation

@article{Cavallari2019LetsTT,
  title={Let's Take This Online: Adapting Scene Coordinate Regression Network Predictions for Online RGB-D Camera Relocalisation},
  author={Tommaso Cavallari and Luca Bertinetto and Jishnu Mukhoti and Philip H. S. Torr and Stuart Golodetz},
  journal={2019 International Conference on 3D Vision (3DV)},
  year={2019},
  pages={564-573}
}
Many applications require a camera to be relocalised online, without expensive offline training on the target scene. [] Key Method Our approach replaces the appearance clustering performed by the branching structure of a regression forest with a two-step process that first uses the network to predict points in the original scene, and then uses these predicted points to look up clusters of points from the new scene. We show experimentally that our online approach achieves state-of-the-art performance on both…
Hierarchical Scene Coordinate Classification and Regression for Visual Localization
TLDR
This work presents a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image and presents a hybrid approach which outperforms existing scene coordinate regression methods, and reduces significantly the performance gap w.r.t. explicit feature matching methods.
Beyond Controlled Environments: 3D Camera Re-Localization in Changing Indoor Scenes
TLDR
This paper adapts 3RScan - a recently introduced indoor RGB-D dataset designed for object instance re-localization - to create RIO10, a new long-term camera re- localization benchmark focused on indoor scenes and explores how state-of-the-art cameraRe-localizers perform according to these metrics.
CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization
TLDR
This work proposes a new benchmark for visual localization in outdoor scenes, using crowd-sourced data to cover a wide range of geographical regions and camera devices with a focus on the failure cases of current algorithms.
Reference Pose Generation for Visual Localization via Learned Features and View Synthesis
TLDR
A semi-automated approach to generate reference poses based on feature matching between renderings of a 3D model and real images via learned features is proposed, showing that state-of-the-art visual localization methods perform better than predicted by the original reference poses.
On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation
TLDR
This paper analyzes two widely used re-localisation datasets and shows that evaluation outcomes indeed vary with the choice of the reference algorithm, and question common beliefs in the re- localisation literature, namely that learning-based scene coordinate regression outperforms classical feature-based methods, and that RGB-D- based methods outperform RGB-based Methods.
Scene Coordinate Regression with Point Clouds for RGB Camera Relocalization
TLDR
Strong evidence that 3D points optimized under multi-view constraints, such as epipolar constraints, reprojection errors, photometric consistency and global visibility, are effective for training the SCoRe network for outdoor relocalization is shown.
Using Image Sequences for Long-Term Visual Localization
TLDR
A sequence-based localization pipeline that combines odometry with both a coarse and a fine localization module is described and it is shown that SIFT features can perform on par with modern state-of-the-art features in the framework, despite being much weaker and a magnitude faster to compute.
Benchmarking Image Retrieval for Visual Localization
TLDR
It is shown that retrieval performance on classical landmark retrieval/recognition tasks correlates only for some but not all tasks to localization performance, indicating a need for retrieval approaches specifically designed for localization tasks.
Is Geometry Enough for Matching in Visual Localization?
TLDR
GoMatch, an alternative to visual-based matching that solely relies on geometric information for matching image keypoints to maps, represented as sets of bearing vectors, confirms its potential and feasibility for real-world localization and opens the door to future efforts in advancing city-scale visual localization methods that do not require storing visual descriptors.
Pan-tilt-zoom SLAM for Sports Videos
TLDR
An online SLAM system specifically designed to track pan-tilt-zoom (PTZ) cameras in highly dynamic sports such as basketball and soccer games and to use rays as landmarks in mapping to overcome the missing depth in pure-rotation cameras.
...
1
2
...

References

SHOWING 1-10 OF 78 REFERENCES
On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation
TLDR
This paper shows how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly, and achieves relocalisation performance that is on par with that of offline forests, and the approach runs in under 150ms, making it desirable for real-time systems that require online Relocalisation.
Geometric Loss Functions for Camera Pose Regression with Deep Learning
  • Alex Kendall, R. Cipolla
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
TLDR
A number of novel loss functions for learning camera pose which are based on geometry and scene reprojection error are explored, and it is shown how to automatically learn an optimal weighting to simultaneously regress position and orientation.
Real-Time RGB-D Camera Pose Estimation in Novel Scenes Using a Relocalisation Cascade
TLDR
An extension of this work that achieves significantly better relocalisation performance whilst running fully in real time, and presents a novel way of visualising the internal behaviour of the forests, and uses the insights gleaned from this to show how to entirely circumvent the need to pre-train a forest on a generic scene.
Random forests versus Neural Networks — What's best for camera localization?
TLDR
The experimental findings show that for scene coordinate regression, traditional NN architectures are superior to test-time efficient RFs and ForestNets, however, this does not translate to final 6D camera pose accuracy where RFsAndForestNets perform slightly better.
PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization
TLDR
This work trains a convolutional neural network to regress the 6-DOF camera pose from a single RGB image in an end-to-end manner with no need of additional engineering or graph optimisation, demonstrating that convnets can be used to solve complicated out of image plane regression problems.
Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization
TLDR
This paper presents a new angle-based reprojection loss, which resolves the issues of the original reprojection lost and enables the system to utilize available multi-view constraints, which further improve performance.
Full-Frame Scene Coordinate Regression for Image-Based Localization
TLDR
This paper proposes to perform the scene coordinate regression in a full-frame manner to make the computation efficient at test time and to add more global context to the regression process to improve the robustness.
Modelling uncertainty in deep learning for camera relocalization
  • Alex Kendall, R. Cipolla
  • Computer Science
    2016 IEEE International Conference on Robotics and Automation (ICRA)
  • 2016
TLDR
A Bayesian convolutional neural network is used to regress the 6-DOF camera pose from a single RGB image and an estimate of the model's relocalization uncertainty is obtained to improve state of the art localization accuracy on a large scale outdoor dataset.
Learning Less is More - 6D Camera Localization via 3D Surface Regression
TLDR
This work addresses the task of predicting the 6D camera pose from a single RGB image in a given 3D environment by developing a fully convolutional neural network for densely regressing so-called scene coordinates, defining the correspondence between the input image and the 3D scene space.
Self-Supervised Visual Descriptor Learning for Dense Correspondence
TLDR
A new approach to learning visual descriptors for dense correspondence estimation is advocated in which the power of a strong three-dimensional generative model is harnessed to automatically label correspondences in RGB-D video data.
...
1
2
3
4
5
...