Aggregating local descriptors into a compact image representation

@article{Jgou2010AggregatingLD,
  title={Aggregating local descriptors into a compact image representation},
  author={Herv{\'e} J{\'e}gou and Matthijs Douze and Cordelia Schmid and Patrick P{\'e}rez},
  journal={2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition},
  year={2010},
  pages={3304-3311}
}
  • H. Jégou, M. Douze, +1 author P. Pérez
  • Published 13 June 2010
  • Mathematics, Computer Science
  • 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
We address the problem of image search on a very large scale, where three constraints have to be considered jointly: the accuracy of the search, its efficiency, and the memory usage of the representation. We first propose a simple yet efficient way of aggregating local image descriptors into a vector of limited dimension, which can be viewed as a simplification of the Fisher kernel representation. We then show how to jointly optimize the dimension reduction and the indexing algorithm, so that… 
Aggregating Local Image Descriptors into Compact Codes
TLDR
This paper first presents and evaluates different ways of aggregating local image descriptors into a vector and shows that the Fisher kernel achieves better performance than the reference bag-of-visual words approach for any given vector dimension.
Large Scale Image Retrieval Using Vectors of Locally Aggregated Descriptors
TLDR
The use of inverted files of standard text search engines to exploit VLAD representation to deal with large-scale image search scenarios significantly outperforms BoW in terms of efficiency and effectiveness on the same hardware and software infrastructure.
Topic modeling and improvement of image representation for large-scale image retrieval
TLDR
The proposed retrieval framework has two major advantages: an aggregation strategy through soft-assignment improves the discriminative power of the representation, which has a determinative effect on the retrieval precision; and the probabilistic latent topic model enables us to handle a large variation in the object appearance.
Uniting Keypoints: Local Visual Information Fusion for Large-Scale Image Search
TLDR
Since, in each image, thousands of local features are reorganized into only dozens of groups and each group is described by a single descriptor, the total amount of descriptors in a large-scale database will be greatly reduced.
Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations
TLDR
This work focuses on diffusion, a mechanism that captures the image manifold in the feature space through a sparse linear system solver, yielding practical query times well below one second.
Deep Image Retrieval: Learning Global Representations for Image Search
TLDR
This work proposes a novel approach for instance-level image retrieval that produces a global and compact fixed-length representation for each image by aggregating many region-wise descriptors by leveraging a ranking framework and projection weights to build the region features.
Improvement of image representation for large-scale image retrieval
TLDR
This paper proposes two effective approaches to aggregate local features of each image into a single vector, which can overcome limitation of existing aggregation methods and make image representation more compact and discriminative.
Enlarging the discriminability of bag-of-words representations with deep convolutional features
  • D. Manger, D. Willersinn
  • Computer Science
    2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)
  • 2017
In this work, we propose an extension of established image retrieval models which are based on the bag-of-words representation, i.e. on models which quantize local features such as SIFT to leverage
Mixture of Subspaces Image Representation and Compact Coding for Large-Scale Image Retrieval
TLDR
This work investigates an asymmetric approach in which the probability distribution of local descriptors is modeled for each individual database image while the local descriptor of a query are used as is, and adopts a mixture model of probabilistic principal component analysis.
Global image representation using Locality-constrained Linear Coding for large-scale image retrieval
TLDR
A global image representation based on Locality-constrained Linear Coding (LLC) is proposed, with an aim to simplify the encoding process of local descriptors so as to facilitate large-scale image retrieval.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 36 REFERENCES
Improving Bag-of-Features for Large Scale Image Search
TLDR
A more precise representation based on Hamming embedding (HE) and weak geometric consistency constraints (WGC) is derived and this approach is shown to outperform the state-of-the-art on the three datasets.
Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search
TLDR
Estimation of the full geometric transformation of bag-of-features in the framework of approximate nearest neighbor search is complementary to the weak geometric consistency constraints and allows to further improve the accuracy.
Large-scale image retrieval with compressed Fisher vectors
TLDR
This article shows why the Fisher representation is well-suited to the retrieval problem: it describes an image by what makes it different from other images, and why it should be compressed to reduce their memory footprint and speed-up the retrieval.
Lost in quantization: Improving particular object retrieval in large scale image databases
The state of the art in visual object retrieval from large databases is achieved by systems that are inspired by text retrieval. A key component of these approaches is that local regions of images
Packing bag-of-features
TLDR
An approximate representation of bag-of-features obtained by projecting the corresponding histogram onto a set of pre-defined sparse projection functions, producing several image descriptors is proposed, which is at least one order of magnitude faster than standard bag- of-features while providing excellent search quality.
Object retrieval with large vocabularies and fast spatial matching
TLDR
To improve query performance, this work adds an efficient spatial verification stage to re-rank the results returned from the bag-of-words model and shows that this consistently improves search quality, though by less of a margin when the visual vocabulary is large.
Evaluation of GIST descriptors for web-scale image search
TLDR
This paper evaluates the search accuracy and complexity of the global GIST descriptor for two applications, for which a local description is usually preferred: same location/object recognition and copy detection, and proposes an indexing strategy for global descriptors that optimizes the trade-off between memory usage and precision.
Scalable Recognition with a Vocabulary Tree
  • D. Nistér, Henrik Stewénius
  • Computer Science
    2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)
  • 2006
TLDR
A recognition scheme that scales efficiently to a large number of objects and allows a larger and more discriminatory vocabulary to be used efficiently is presented, which it is shown experimentally leads to a dramatic improvement in retrieval quality.
Fisher Kernels on Visual Vocabularies for Image Categorization
  • F. Perronnin, C. Dance
  • Mathematics, Computer Science
    2007 IEEE Conference on Computer Vision and Pattern Recognition
  • 2007
TLDR
This work shows that Fisher kernels can actually be understood as an extension of the popular bag-of-visterms, and proposes to apply this framework to image categorization where the input signals are images and where the underlying generative model is a visual vocabulary: a Gaussian mixture model which approximates the distribution of low-level features in images.
Video Google: a text retrieval approach to object matching in videos
We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint
...
1
2
3
4
...