Corpus ID: 219558643

SEKD: Self-Evolving Keypoint Detection and Description

  title={SEKD: Self-Evolving Keypoint Detection and Description},
  author={Yafei Song and Lingyi Cai and Jia Li and Yonghong Tian and Mingyang Li},
Researchers have attempted utilizing deep neural network (DNN) to learn novel local features from images inspired by its recent successes on a variety of vision tasks. However, existing DNN-based algorithms have not achieved such remarkable progress that could be partly attributed to insufficient utilization of the interactive characters between local feature detector and descriptor. To alleviate these difficulties, we emphasize two desired properties, i.e., repeatability and reliability, to… Expand
DXSLAM: A Robust and Efficient Visual SLAM System with Deep Features
This paper shows that feature extraction with deep convolutional neural networks (CNNs) can be seamlessly incorporated into a modern SLAM framework, and the full system achieves much lower trajectory errors and much higher correct rates on all evaluated data. Expand
RaP-Net: A Region-wise and Point-wise Weighting Network to Extract Robust Keypoints for Indoor Localization
A novel network, RaP-Net, is proposed, which explicitly addresses feature invariability with a region-wise predictor, and combines it with a point-wise predictors to select reliable keypoints in an image. Expand
Self-Supervised Keypoint Detection Based on Multi-Layer Random Forest Regressor
The proposed KeyReg showed superior performance in terms of repeatability, the accuracy of the homography, mean matching accuracy (MMA), and localization errors on HPatches dataset compared to state-of-the-art methods. Expand
Learnable Motion Coherence for Correspondence Pruning
A novel formulation of fitting coherent motions with a smooth function on a graph of correspondences is proposed and it is shown that this formulation allows a closed-form solution by graph Laplacian. Expand
Discriminative and Semantic Feature Selection for Place Recognition towards Dynamic Environments
A discriminative and semantic feature selection network, dubbed as DSFeat, which can estimate pixelwise stability of features, indicating the probability of a static and stable region from which features are extracted, and then select features that are insensitive to dynamic interference and distinguishable to be correctly matched. Expand
SOLD2: Self-supervised Occlusion-aware Line Description and Detection
This work introduces the first joint detection and description of line segments in a single deep network, which is highly discriminative, while remaining robust to viewpoint changes and occlusions. Expand


Learning Local Feature Descriptor with Motion Attribute For Vision-based Localization
A fully convolutional network is designed, named MD-Net, to perform motion attribute estimation and feature description simultaneously and can be integrated into a vision-based localization algorithm to improve estimation accuracy significantly. Expand
SuperPoint: Self-Supervised Interest Point Detection and Description
This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision and introduces Homographic Adaptation, a multi-scale, multi-homography approach for boosting interest point detection repeatability and performing cross-domain adaptation. Expand
R2D2: Repeatable and Reliable Detector and Descriptor
This work argues that salient regions are not necessarily discriminative, and therefore can harm the performance of the description, and proposes to jointly learn keypoint detection and description together with a predictor of the local descriptor discriminativeness. Expand
D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
This work proposes an approach where a single convolutional neural network plays a dual role: It is simultaneously a dense feature descriptor and a feature detector, and shows that this model can be trained using pixel correspondences extracted from readily available large-scale SfM reconstructions, without any further annotations. Expand
GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints
This paper proposes a novel local descriptor learning approach that integrates geometry constraints from multi-view reconstructions, which benefits the learning process in terms of data generation, data sampling and loss computation, and demonstrates its superior performance on various large-scale benchmarks. Expand
Discriminative Learning of Deep Convolutional Feature Point Descriptors
This paper uses Convolutional Neural Networks to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches to develop 128-D descriptors whose euclidean distances reflect patch similarity and can be used as a drop-in replacement for any task involving SIFT. Expand
L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space
The good generalization ability shown by experiments indicates that L2-Net can serve as a direct substitution of the existing handcrafted descriptors as well as a progressive sampling strategy which enables the network to access billions of training samples in a few epochs. Expand
LF-Net: Learning Local Features from Images
A novel deep architecture and a training strategy to learn a local feature pipeline from scratch, using collections of images without the need for human supervision, and shows that it can optimize the network in a two-branch setup by confining it to one branch, while preserving differentiability in the other. Expand
TILDE: A Temporally Invariant Learned DEtector
We introduce a learning-based approach to detect repeatable keypoints under drastic imaging changes of weather and lighting conditions to which state-of-the-art keypoint detectors are surprisinglyExpand
ContextDesc: Local Descriptor Augmentation With Cross-Modality Context
This paper proposes a unified learning framework that leverages and aggregates the cross-modality contextual information, including visual context from high-level image representation, and geometric context from 2D keypoint distribution, and proposes an effective N-pair loss that eschews the empirical hyper-parameter search and improves the convergence. Expand