Digging Into Self-Supervised Learning of Feature Descriptors

  title={Digging Into Self-Supervised Learning of Feature Descriptors},
  author={Iaroslav Melekhov and Zakaria Laskar and Xiaotian Li and Shuzhe Wang and Juho Kannala},
  journal={2021 International Conference on 3D Vision (3DV)},
Fully-supervised CNN-based approaches for learning local image descriptors have shown remarkable results in a wide range of geometric tasks. However, most of them require per-pixel ground-truth keypoint correspondence data which is difficult to acquire at scale. To address this challenge, recent weakly-and self-supervised methods can learn feature descriptors from relative camera poses or using only synthetic rigid transformations such as homographies. In this work, we focus on understanding… 


Neural Outlier Rejection for Self-Supervised Keypoint Learning
This work proposes a novel end-to-end self-supervised learning scheme that can effectively exploit unlabeled data to provide more reliable keypoints under various scene conditions and greatly improves the quality of feature matching and homography estimation on challenging benchmarks over the state-of-the-art.
LoFTR: Detector-Free Local Feature Matching with Transformers
The proposed method, LoFTR, uses self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images, and enables the method to produce dense matches in low-texture areas, where feature detectors usually struggle to produce repeatable interest points.
Fine-Tuning CNN Image Retrieval with No Human Annotation
It is shown that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval.
ASLFeat: Learning Local Features of Accurate Shape and Localization
  • Zixin Luo, Lei Zhou, Long Quan
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
This work focuses on mitigating two limitations in the joint learning of local feature detectors and descriptors, by resorting to deformable convolutional networks to densely estimate and apply local transformation in ASLFeat.
End-to-End Learning of Deep Visual Representations for Image Retrieval
This article uses a large-scale but noisy landmark dataset and develops an automatic cleaning method that produces a suitable training set for deep retrieval, and builds on the recent R-MAC descriptor, which can be interpreted as a deep and differentiable architecture, and presents improvements to enhance it.
LF-Net: Learning Local Features from Images
A novel deep architecture and a training strategy to learn a local feature pipeline from scratch, using collections of images without the need for human supervision, and shows that it can optimize the network in a two-branch setup by confining it to one branch, while preserving differentiability in the other.
Local Descriptors Optimized for Average Precision
  • Kun He, Yan Lu, S. Sclaroff
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
This paper improves the learning of local feature descriptors by optimizing the performance of descriptor matching, which is a common stage that follows descriptor extraction in local feature based pipelines, and can be formulated as nearest neighbor retrieval.
R2D2: Reliable and Repeatable Detector and Descriptor
This work argues that repeatable regions are not necessarily discriminative and can therefore lead to select suboptimal keypoints, and proposes to jointly learn keypoint detection and description together with a predictor of the local descriptor discriminativeness.
L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space
The good generalization ability shown by experiments indicates that L2-Net can serve as a direct substitution of the existing handcrafted descriptors as well as a progressive sampling strategy which enables the network to access billions of training samples in a few epochs.
UR2KiD: Unifying Retrieval, Keypoint Detection, and Keypoint Description without Local Correspondence Supervision
In this paper, we explore how three related tasks, namely keypoint detection, description, and image retrieval can be jointly tackled using a single unified framework, which is trained without the