Convolutional Neural Networks Learn Compact Local Image Descriptors

  title={Convolutional Neural Networks Learn Compact Local Image Descriptors},
  author={Christian Osendorfer and Justin Bayer and Sebastian Urban and Patrick van der Smagt},
We investigate if a deep Convolutional Neural Network can learn representations of local image patches that are usable in the important task of keypoint matching. We examine several possible loss functions for this correspondance task and show emprically that a newly suggested loss formulation allows a Convolutional Neural Network to find compact local image descriptors that perform comparably to state-of-the-art approaches. 

Learning local feature descriptors with triplets and shallow convolutional neural networks

This work proposes to utilize triplets of training samples, together with in-triplet mining of hard negatives, and shows that this method achieves state of the art results, without the computational overhead typically associated with mining of negatives and with lower complexity of the network architecture.

DHNet: working double hard to learn a convolutional neural network-based local descriptor

A convolutional neural network-based local descriptor named DHNet is proposed with a considerate sampling strategy and a dedicated loss function, which significantly outperforms the state-of-the-art methods in terms of strong discrimination ability.


Experiments show that the learned descriptor reaches a good performance and achieves stateof-art results in terms of the false positive rate at a 95% recall rate on standard benchmark datasets.

Fracking Deep Convolutional Image Descriptors

A siamese architecture of Deep Convolutional Neural Networks, with a Hinge embedding loss on the L2 distance between descriptors is explored, with large performance gains compared to both standard CNN learning strategies, hand-crafted image descriptors, and the state-of-the-art on learned descriptors.

Discriminative Learning of Deep Convolutional Feature Point Descriptors

This paper uses Convolutional Neural Networks to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches to develop 128-D descriptors whose euclidean distances reflect patch similarity and can be used as a drop-in replacement for any task involving SIFT.

Image Patch Matching Using Convolutional Descriptors with Euclidean Distance

A neural network based image descriptor suitable for image patch matching, which is an important task in many computer vision applications, and outperforms state-of-the-art \(L_2\)-based descriptors and can be considered as a direct replacement of SIFT.

Trainable Siamese keypoint descriptors for real-time applications

This paper explores a Siamese pairing of fully connected neural networks for the purpose of learning discriminative local feature descriptors, and demonstrates consistent speedup as compared to such state-of-the-art methods as SIFT and FREAK on PCs as well as in embedded systems.

HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors

A novel benchmark for evaluating local image descriptors is proposed and it is shown that a simple normalisation of traditional hand-crafted descriptors can boost their performance to the level of deep learning based descriptors within a realistic benchmarks evaluation.

Feature detection and description for image matching: from hand-crafted design to deep learning

The general framework of local feature description is presented and the evolution from hand-crafted feature descriptors, e.g. SIFT (Scale Invariant Feature Transform), to machine learning and deep learning based descriptors is discussed.

Resources and Future Work

Several benchmarks used for evaluating local image descriptors are introduced and some future directions according to the authors’ opinions are described.



Learning Object-Class Segmentation with Convolutional Neural Networks

A convolutional network architecture that includes innovative elements, such as multiple output maps, suitable loss functions, supervised pretraining, multiscale inputs, reused outputs, and pairwise class location lters is proposed.

Discriminative Learning of Local Image Descriptors

A set of building blocks for constructing descriptors which can be combined together and jointly optimized so as to minimize the error of a nearest-neighbor classifier are described.

ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

Multi-column deep neural networks for image classification

On the very competitive MNIST handwriting benchmark, this method is the first to achieve near-human performance and improves the state-of-the-art on a plethora of common image classification benchmarks.

Descriptor Learning Using Convex Optimisation

The objective of this work is to learn descriptors suitable for the sparse feature detectors used in viewpoint invariant matching, and it is shown that learning the pooling regions for the descriptor can be formulated as a convex optimisation problem selecting the regions using sparsity.

Learning a similarity metric discriminatively, with application to face verification

The idea is to learn a function that maps input patterns into a target space such that the L/sub 1/ norm in the target space approximates the "semantic" distance in the input space.

What is the best multi-stage architecture for object recognition?

It is shown that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks and that two stages of feature extraction yield better accuracy than one.

Distinctive Image Features from Scale-Invariant Keypoints

This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are ...

Self-organizing neural network that discovers surfaces in random-dot stereograms

The authors' simulations show that when the learning procedure is applied to adjacent patches of two-dimensional images, it allows a neural network that has no prior knowledge of the third dimension to discover depth in random dot stereograms of curved surfaces.