• Corpus ID: 218581404

Reference Pose Generation for Visual Localization via Learned Features and View Synthesis

  title={Reference Pose Generation for Visual Localization via Learned Features and View Synthesis},
  author={Zichao Zhang and Torsten Sattler and Davide Scaramuzza},
Visual Localization is one of the key enabling technologies for autonomous driving and augmented reality. High quality datasets with accurate 6 Degree-of-Freedom (DoF) reference poses are the foundation for benchmarking and improving existing methods. Traditionally, reference poses have been obtained via Structure-from-Motion (SfM). However, SfM itself relies on local features which are prone to fail when images were taken under different conditions, e.g., day/night changes. At the same time… 
ReLoc: Indoor Visual Localization with Hierarchical Sitemap and View Synthesis
Based on the key observation of the constant existence of planar surfaces such as floors or walls in indoor scenes, a novel system incorporating geometric information to address issues only using pixelated images is proposed that outperforms the state-of-the-art approaches in visual localization validity and accuracy.
Benchmarking Image Retrieval for Visual Localization
It is shown that retrieval performance on classical landmark retrieval/recognition tasks correlates only for some but not all tasks to localization performance, indicating a need for retrieval approaches specifically designed for localization tasks.
Retrieval and Localization with Observation Constraints
This work proposes an integrated visual re-localization method called RLOCS by combining image retrieval, semantic consistency and geometry verification to achieve accurate estimations and achieves many performance improvements on the challenging localization benchmarks.
Using Image Sequences for Long-Term Visual Localization
A sequence-based localization pipeline that combines odometry with both a coarse and a fine localization module is described and it is shown that SIFT features can perform on par with modern state-of-the-art features in the framework, despite being much weaker and a magnitude faster to compute.
Large-scale Localization Datasets in Crowded Indoor Spaces
A robust LiDAR SLAM is developed which provides initial poses that are then refined using a novel structure-from-motion based optimization and presented a benchmark of modern visual localization algorithms on these challenging datasets showing superior performance of structure-based methods using robust image features.
Image Stylization for Robust Features
This work uses trained feature networks to compete in Long-Term Visual Localization and Map-based Localization for Autonomous Driving challenges achieving competitive scores, and shows that image stylization in addition to color augmentation is a powerful method of learning robust features.
Domain Adaptation of Learned Featuresfor Visual Localization
This work proposes a few-shot domain adaptation framework for learned local features that deals with varying conditions in visual localization, and demonstrates the superior performance over baselines, while using a scarce number of training examples from the target domain.
Day to Night Image Style Transfer with Light Control
This work addresses the challenging problem of data augmentation and proposes a novel approach in day-to-night image translation with 3Daware light control that is on par or even outperforms competitive state-of-the-art methods for image translation.
ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching
ClusterGNN is proposed, an attentional GNN architecture which operates on clusters for learning the feature matching task, using a progressive clustering module to adaptively divide keypoints into different subgraphs to reduce redundant connectivity, and employ a coarse-to-fine paradigm for mitigating miss-classing within images.
Robust Image Retrieval-based Visual Localization using Kapture
This paper presents kapture, a flexible data format and processing pipeline for structure from motion and visual localization that is released open source that is based on robust image retrieval for coarse camera pose estimation and robust local features for accurate pose refinement.


Understanding the Limitations of CNN-Based Absolute Camera Pose Regression
A theoretical model for camera pose regression is developed that is more closely related to pose approximation via image retrieval than to accurate pose estimation via 3D structure, and shows that additional research is needed before pose regression algorithms are ready to compete with structure-based methods.
Local Supports Global: Deep Camera Relocalization With Sequence Enhancement
This work exploits the spatial-temporal consistency in sequential images to alleviate uncertainty due to visual ambiguities by incorporating a visual odometry (VO) component and introduces two effective steps called content-augmented pose estimation and motion-based refinement.
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
This paper introduces the first benchmark datasets specifically designed for analyzing the impact of day-night changes, weather and seasonal variations, as well as sequence-based localization approaches and the need for better local features on visual localization.
Real-Time RGB-D Camera Pose Estimation in Novel Scenes Using a Relocalisation Cascade
An extension of this work that achieves significantly better relocalisation performance whilst running fully in real time, and presents a novel way of visualising the internal behaviour of the forests, and uses the insights gleaned from this to show how to entirely circumvent the need to pre-train a forest on a generic scene.
To Learn or Not to Learn: Visual Localization from Essential Matrices
A novel framework for visual localization from relative poses is proposed, replacing the classical approach with learned alternatives at various levels, and the reasons for why deep learned approaches do not perform well are identified.
InLoc: Indoor Visual Localization with Dense Matching and View Synthesis
A new large-scale visual localization method targeted for indoor environments that significantly outperforms current state-of-the-art indoor localization approaches on this new challenging data.
Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?
It is demonstrated experimentally that large-scale 3D models are not strictly necessary for accurate visual localization, and it is shown that combining image-based methods with local reconstructions results in a pose accuracy similar to the state-of-the-art structure- based methods.
Deep Auxiliary Learning for Visual Localization and Odometry
This work proposes VLocNet, a new convolutional neural network architecture for 6-DoF global pose regression and odometry estimation from consecutive monocular images, and proposes a novel loss function that utilizes auxiliary learning to leverage relative pose information during training, thereby constraining the search space to obtain consistent pose estimates.
Prior Guided Dropout for Robust Visual Localization in Dynamic Environments
This paper proposes a framework which can be generally applied to existing CNN-based pose regressors to improve their robustness in dynamic environments and includes a prior guided dropout module coupled with a self-attention module which can guide CNNs to ignore foreground objects during both training and inference.
Large scale joint semantic re-localisation and scene understanding via globally unique instance coordinate regression
A novel formulation of scene coordinate regression as two separate tasks of object instance recognition and local coordinate regression that allows to predict accurate 3D geometry of static objects and estimate 6-DoF pose of camera on maps larger by several orders of magnitude than previously attempted byscene coordinate regression methods.