Bags of Local Convolutional Features for Scalable Instance Search

  title={Bags of Local Convolutional Features for Scalable Instance Search},
  author={Eva Mohedano and Amaia Salvador and Kevin McGuinness and Ferran Marqu{\'e}s and Noel E. O'Connor and Xavier Giro-i-Nieto},
  journal={Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval},
This work proposes a simple instance retrieval pipeline based on encoding the convolutional features of CNN using the bag of words aggregation scheme (BoW). Assigning each local array of activations in a convolutional layer to a visual word produces an assignment map, a compact representation that relates regions of an image with a visual word. We use the assignment map for fast spatial reranking, obtaining object localizations that are used for query expansion. We demonstrate the suitability… 

Figures and Tables from this paper

Deep Encoding Features for Instance Retrieval

This paper first locate several candidate regions of target object with a region proposal network (RPN), instead of exhausting sliding window method, and obtains the region-wise convolutional feature maps (CFMs) by forwarding them through a ROI pooling layer.

Class-Weighted Convolutional Features for Visual Instance Search

This paper proposes a local-aware encoding of convolutional features based on semantic information predicted in the target image and obtains the most discriminative regions of an image using Class Activation Maps (CAMs), which is based on the knowledge contained in the network and has the additional advantage of not requiring external information.

Class Weighted Convolutional Features for Image Retrieval

This work employs Class Activation Maps (CAMs) to obtain the most discriminative regions of the image from a semantic perspective and demonstrates that this system is competitive and even outperforms the current state-of-the-art when using off- the-shelf models trained on the object classes of ImageNet.

Saliency Weighted Convolutional Features for Instance Search

A retrieval framework based on bags of local convolutional features (BLCF) that benefits from saliency weighting to build an efficient image representation that outperforms the state-of-the-art on the challenging INSTRE benchmark by a large margin and provides similar performance on the Oxford and Paris benchmarks.

Fully Unsupervised Convolutional Learning for Fast Image Retrieval

The experimental evaluation indicates the effectiveness of the proposed method in learning more efficient representations for the retrieval task, outperforming other unsu-pervised CNN-based retrieval techniques, as well as conventional hand-crafted feature-based approaches in all the used datasets.

Fine-Tuning CNN Image Retrieval with No Human Annotation

It is shown that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval.

Regional Attention Based Deep Feature for Image Retrieval

This work builds a simple and effective, contextaware regional attention network that weights an attentive score of a region considering global attentiveness and shows higher accuracy improvement combined over prior methods, when combined with the query expansion method.

Local Deep Descriptors in Bag-of-Words for Image Retrieval

This paper shows how to use the CNN as a combination of local feature detector and extractor, without the need of feeding multiple image patches to the network, and achieves state-of-the-art performance on different datasets without re-ranking.

Effective triplet mining improves training of multi-scale pooled CNN for image retrieval

An end-to-end trainable network architecture that exploits a novel multi-scale local pooling based on the trainable aggregation layer NetVLAD and bags of local features obtained by splitting the activations, allowing to reduce the dimensionality of the descriptor and to increase the performance of retrieval.



Particular object retrieval with integral max-pooling of CNN activations

This work revisits both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN, and significantly improves existing CNN-based recognition pipeline.

A Baseline for Visual Instance Retrieval with Deep Convolutional Networks

This paper presents a simple pipeline for visual instance retrieval exploiting image representations based on convolutional networks (ConvNets), and demonstrates that ConvNet image representations

Aggregating Local Deep Features for Image Retrieval

This paper shows that deep features and traditional hand-engineered features have quite different distributions of pairwise similarities, hence existing aggregation methods have to be carefully re-evaluated and reveals that in contrast to shallow features, the simple aggregation method based on sum pooling provides the best performance for deep convolutional features.

Aggregating local descriptors into a compact image representation

This work proposes a simple yet efficient way of aggregating local image descriptors into a vector of limited dimension, which can be viewed as a simplification of the Fisher kernel representation, and shows how to jointly optimize the dimension reduction and the indexing algorithm.

DeepIndex for Accurate and Efficient Image Retrieval

This work attempts to introduce deep features into inverted index based image retrieval and proposes the DeepIndex framework, which finds the optimal integration of one midlevel deep feature and one high- level deep feature, from two different CNN architectures separately.

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition

A convolutional neural network architecture that is trainable in an end-to-end manner directly for the place recognition task and an efficient training procedure which can be applied on very large-scale weakly labelled tasks are developed.

Query-adaptive late fusion with neural network for instance search

This paper proposes a new way of adaptively combining DPM, an object detector, in a hybrid model for visual instance search using late fusion technique to improve final result and uses a neural network to find the optimal weights for each type of query objects.

Large vocabulary quantization for searching instances from videos

This paper proposed an algorithm for instance search that outperformed all submissions on the instance search dataset TRECVID 2011, and showed that the top performance is mainly due to similar scene retrieval, instead of the same instance search.

Neural Codes for Image Retrieval

A thorough discussion of several state-of-the-art techniques in image retrieval by considering the associated subproblems: image description, descriptor compression, nearest-neighbor search and query expansion, and the combined use of deep architectures and hand-crafted image representations for accurate and efficient image retrieval.

Lost in quantization: Improving particular object retrieval in large scale image databases

The state of the art in visual object retrieval from large databases is achieved by systems that are inspired by text retrieval. A key component of these approaches is that local regions of images