Augmenting Visual Place Recognition With Structural Cues

@article{Oertel2020AugmentingVP,
  title={Augmenting Visual Place Recognition With Structural Cues},
  author={Amadeus Oertel and Titus Cieslewski and Davide Scaramuzza},
  journal={IEEE Robotics and Automation Letters},
  year={2020},
  volume={5},
  pages={5534-5541}
}
In this letter, we propose to augment image-based place recognition with structural cues. Specifically, these structural cues are obtained using structure-from-motion, such that no additional sensors are needed for place recognition. This is achieved by augmenting the 2D convolutional neural network (CNN) typically used for image-based place recognition with a 3D CNN that takes as input a voxel grid derived from the structure-from-motion point cloud. We evaluate different methods for fusing the… 
AdaFusion: Visual-LiDAR Fusion with Adaptive Weights for Place Recognition
TLDR
An adaptive weighting visual-LiDAR fusion method, named AdaFusion, is proposed to learn the weights for both images and point cloud features, and a twostage fusion approach is designed to combine the 2D and 3D attention.
Place recognition survey: An update on deep learning approaches
TLDR
This survey surveys recent approaches and methods used in place recognition, particularly those based on deep learning, and highlights the importance of NetVLAD for supervised end-to-end learning and the advantages of unsupervised approaches in place Recognition, namely for cross-domain applications.
AttDLNet: Attention-based DL Network for 3D LiDAR Place Recognition
TLDR
A novel 3D LiDAR-based deep learning network (named AttDLNet) that comprises an encoder network and exploits an attention mechanism to selectively focus on long-range context and interfeature relationships is proposed.
CORAL: Colored structural representation for bi-modal place recognition
TLDR
A bi-modal place recognition method, which can extract a compound global descriptor from the two modalities, vision and LiDAR, and which has superior performance against other state-of-the-art methods.
Location Identification and Personalized Recommendation of Tourist Attractions Based on Image Processing
TLDR
This paper puts forward a novel method for location identification and personalized recommendation of tourist attractions based on image processing, grounded on hash retrieval, that was proved accurate and effective through experiments.
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition
TLDR
This work introduces a discriminative multimodal descriptor based on a pair of sensor readings: a point cloud from a LiDAR and an image from an RGB camera, and uses late fusion approach, where each modality is processed separately and fused in the final part of the processing pipeline.
Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition
TLDR
Patch-NetVLAD is introduced, which provides a novel formulation for combining the advantages of both local and global descriptor methods by deriving patch-level features from NetVLAD residuals, well suited to enhance both stand-alone place recognition capabilities and the overall performance of SLAM systems.
Scan Context++: Structural Place Recognition Robust to Rotation and Lateral Variations in Urban Environments
TLDR
This paper addresses structural place recognition by recognizing a place based on structural appearance, namely from range sensors by extending the previous work on a rotation invariant spatial descriptor and introducing two sub-descriptors, thereby bridging the gap between topological place retrieval and metric localization.
SeqNet: Learning Descriptors for Sequence-Based Hierarchical Place Recognition
TLDR
A novel hybrid system is presented that creates a high performance initial match hypothesis generator using short learnt sequential descriptors, which enable selective control sequential score aggregation using single image learnt descriptors.
SeqNetVLAD vs PointNetVLAD: Image Sequence vs 3D Point Clouds for Day-Night Place Recognition
TLDR
A 3D point cloud based method (PointNetVLAD) is compared with image sequence based methods (SeqNet and others) and it is shown that image sequencebased techniques approach, and can even surpass, the performance achieved by point cloudbased methods for a given metric span.
...
1
2
...

References

SHOWING 1-10 OF 59 REFERENCES
ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras
TLDR
ORB-SLAM2, a complete simultaneous localization and mapping (SLAM) system for monocular, stereo and RGB-D cameras, including map reuse, loop closing, and relocalization capabilities, is presented, being in most cases the most accurate SLAM solution.
Fine-Tuning CNN Image Retrieval with No Human Annotation
TLDR
It is shown that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval.
Multi-Process Fusion: Visual Place Recognition Using Multiple Image Processing Methods
TLDR
A novel “multi-sensor” fusion approach applied to multiple image processing methods for a single visual image stream, combined with a dynamic sequence matching length technique and an automatic weighting scheme that enables reduced localization latencies through analysis of recognition quality metrics when re-entering familiar locations.
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition
TLDR
A convolutional neural network architecture that is trainable in an end-to-end manner directly for the place recognition task and an efficient training procedure which can be applied on very large-scale weakly labelled tasks are developed.
Efficient descriptor learning for large scale localization
TLDR
This work proposes a novel approach to compress descriptors while increasing their discriminability and match-ability, based on recent advances in neural networks, and shows the importance of including contextual appearance information to the visual feature in order to improve matching under strong viewpoint, illumination and scene changes.
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
TLDR
This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.
ORB: An efficient alternative to SIFT or SURF
TLDR
This paper proposes a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise, and demonstrates through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations.
Distinctive Image Features from Scale-Invariant Keypoints
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are ...
Place Recognition in Semi-Dense Maps: Geometric and Learning-Based Approaches.
TLDR
This paper proposes to represent a scene’s structure with semi-dense point clouds, due to their highly informative power, and the simplicity of their generation through mature visual odometry and SLAM systems, and is the first to propose place recognition in semi-Dense maps.
Point cloud descriptors for place recognition using sparse visual information
TLDR
A novel structural descriptor is proposed which aggregates sparse triangulated landmarks from SLAM into a compact signature which provides a discriminative fingerprint to recognize places over seasonal and viewpoint changes which is particularly challenging for approaches based on sparse visual descriptors.
...
1
2
3
4
5
...