CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples

@inproceedings{Radenovi2016CNNIR,
  title={CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples},
  author={Filip Radenovi{\'c} and Giorgos Tolias and Ondřej Chum},
  booktitle={European Conference on Computer Vision},
  year={2016}
}
Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks. [] Key Method We employ state-of-the-art retrieval and Structure-from-Motion (SfM) methods to obtain 3D models, which are used to guide the selection of the training data for CNN fine-tuning. We show that both hard positive and hard negative examples enhance the final performance in particular object retrieval with compact codes.

Fine-Tuning CNN Image Retrieval with No Human Annotation

It is shown that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval.

Pruning Convolutional Neural Networks for Image Instance Retrieval

This work introduces both data-independent and data-dependent heuristics to prune convolutional edges, and demonstrates that the combination of heuristic pruning and fine-tuning offers 5x compression rate without considerable loss in retrieval performance.

Object Retrieval with Deep Convolutional Features

This chapter recommends a simple pipeline for encoding the local activations of a convolutional layer of a pretrained CNN utilizing the well-known Bag of Words (BoW) aggregation scheme and called bag of local Convolutional features (BLCF).

Exploring geometric information in CNN for image retrieval

This paper proposes an unsupervised weighting scheme for pre-trained CNN models to adaptively emphasize image center, and aggregates the activations of convolutional layers on image patches to depict local patterns in details.

End-to-End Learning of Deep Visual Representations for Image Retrieval

This article uses a large-scale but noisy landmark dataset and develops an automatic cleaning method that produces a suitable training set for deep retrieval, and builds on the recent R-MAC descriptor, which can be interpreted as a deep and differentiable architecture, and presents improvements to enhance it.

Towards Optimal CNN Descriptors for Large-Scale Image Retrieval

This paper presents a unified implementation of modern global-CNN-based retrieval systems, break such a system into six major components, and investigates each part individually as well as globally when considering different configurations, and introduces a novel joint loss function with learnable parameter for fine-tuning for retrieval tasks.

Class-Weighted Convolutional Features for Visual Instance Search

This paper proposes a local-aware encoding of convolutional features based on semantic information predicted in the target image and obtains the most discriminative regions of an image using Class Activation Maps (CAMs), which is based on the knowledge contained in the network and has the additional advantage of not requiring external information.

Towards Good Practices for Image Retrieval Based on CNN Features

A parameter-free approach to CNN feature extraction for instance-level image search that achieves state-of-the-art performance without the need of CNN finetuning or additional data in any way is introduced.

Deep Image Retrieval: Learning Global Representations for Image Search

This work proposes a novel approach for instance-level image retrieval that produces a global and compact fixed-length representation for each image by aggregating many region-wise descriptors by leveraging a ranking framework and projection weights to build the region features.

Weakly Supervised Soft-detection-based Aggregation Method for Image Retrieval

This paper proposes a novel weakly supervised soft-detection-based aggregation (SDA) method free from bounding box annotations for image retrieval that achieves state-of-the-art performance on most benchmarks.
...

References

SHOWING 1-10 OF 58 REFERENCES

Deep Image Retrieval: Learning Global Representations for Image Search

This work proposes a novel approach for instance-level image retrieval that produces a global and compact fixed-length representation for each image by aggregating many region-wise descriptors by leveraging a ranking framework and projection weights to build the region features.

A Baseline for Visual Instance Retrieval with Deep Convolutional Networks

This paper presents a simple pipeline for visual instance retrieval exploiting image representations based on convolutional networks (ConvNets), and demonstrates that ConvNet image representations

Particular object retrieval with integral max-pooling of CNN activations

This work revisits both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN, and significantly improves existing CNN-based recognition pipeline.

Visual Instance Retrieval with Deep Convolutional Networks

This paper provides an extensive study on the availability of image representations based on convolutional networks (ConvNets) for the task of visual instance retrieval and presents an efficient pipeline exploiting multi-scale schemes to extract local features by taking geometric invariance into explicit account.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

Very Deep Convolutional Networks for Large-Scale Image Recognition

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

Part-Based R-CNNs for Fine-Grained Category Detection

This work proposes a model for fine-grained categorization that overcomes limitations by leveraging deep convolutional features computed on bottom-up region proposals, and learns whole-object and part detectors, enforces learned geometric constraints between them, and predicts a fine- grained category from a pose-normalized representation.

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition

A convolutional neural network architecture that is trainable in an end-to-end manner directly for the place recognition task and an efficient training procedure which can be applied on very large-scale weakly labelled tasks are developed.

Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks

This work designs a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset, and shows that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification.

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

DeCAF, an open-source implementation of deep convolutional activation features, along with all associated network parameters, are released to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.
...