Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments

  title={Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments},
  author={Siyan Dong and Qingnan Fan and He Wang and Ji Shi and Li Yi and Thomas A. Funkhouser and Baoquan Chen and Leonidas J. Guibas},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Siyan Dong, Qingnan Fan, L. Guibas
  • Published 8 December 2020
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Localizing the camera in a known indoor environment is a key building block for scene mapping, robot navigation, AR, etc. Recent advances estimate the camera pose via optimization over the 2D/3D-3D correspondences established between the coordinates in 2D/3D camera space and 3D world space. Such a mapping is estimated with either a convolution neural network or a decision tree using only the static input image sequence, which makes these approaches vulnerable to dynamic indoor environments that… 

Figures and Tables from this paper

Visually plausible human-object interaction capture from wearable sensors
HOPS is the first method to capture interactions such as dragging objects and opening doors from ego-centric data alone, allowing to track objects even when they are not visible from the head camera.
CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization
This work proposes a new benchmark for visual localization in outdoor scenes, using crowd-sourced data to cover a wide range of geographical regions and camera devices with a focus on the failure cases of current algorithms.
Multi-Modal Visual Place Recognition in Dynamics-Invariant Perception Space
This letter for the first time explores the use of multi-modal fusion of semantic and visual modalities in dynamics-invariant space to improve place recognition in dynamic environments by first designing a novel deep learning architecture to generate the static semantic segmentation and recover the static image directly from the corresponding dynamic image.
Projective Manifold Gradient Layer for Deep Rotation Regression
The proposed regularized projective manifold gradient (RPMG) method helps networks achieve new state-of-the-art performance in a variety of rotation estimation tasks and can be applied to other smooth manifolds such as the unit sphere.


Backtracking regression forests for accurate camera relocalization
A sample-balanced objective to encourage equal numbers of samples in the left and right sub-trees, and a novel backtracking scheme to remedy the incorrect 2D-3D correspondence predictions are proposed.
Geometry-Aware Learning of Maps for Camera Localization
This work proposes to represent maps as a deep neural net called MapNet, which enables learning a data-driven map representation and proposes a novel parameterization for camera rotation which is better suited for deep-learning based camera pose regression.
Full-Frame Scene Coordinate Regression for Image-Based Localization
This paper proposes to perform the scene coordinate regression in a full-frame manner to make the computation efficient at test time and to add more global context to the regression process to improve the robustness.
Learning Less is More - 6D Camera Localization via 3D Surface Regression
This work addresses the task of predicting the 6D camera pose from a single RGB image in a given 3D environment by developing a fully convolutional neural network for densely regressing so-called scene coordinates, defining the correspondence between the input image and the 3D scene space.
SANet: Scene Agnostic Network for Camera Localization
This paper presents a scene agnostic neural architecture for camera localization, where model parameters and scenes are independent from each other, and predicts a dense scene coordinate map of a query RGB image on-the-fly given an arbitrary scene.
Random forests versus Neural Networks — What's best for camera localization?
The experimental findings show that for scene coordinate regression, traditional NN architectures are superior to test-time efficient RFs and ForestNets, however, this does not translate to final 6D camera pose accuracy where RFsAndForestNets perform slightly better.
Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring
On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation
This paper shows how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly, and achieves relocalisation performance that is on par with that of offline forests, and the approach runs in under 150ms, making it desirable for real-time systems that require online Relocalisation.
PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization
This work trains a convolutional neural network to regress the 6-DOF camera pose from a single RGB image in an end-to-end manner with no need of additional engineering or graph optimisation, demonstrating that convnets can be used to solve complicated out of image plane regression problems.
Geometric Loss Functions for Camera Pose Regression with Deep Learning
  • Alex Kendall, R. Cipolla
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
A number of novel loss functions for learning camera pose which are based on geometry and scene reprojection error are explored, and it is shown how to automatically learn an optimal weighting to simultaneously regress position and orientation.