Lost in quantization: Improving particular object retrieval in large scale image databases

  title={Lost in quantization: Improving particular object retrieval in large scale image databases},
  author={James Philbin and Ondřej Chum and Michael Isard and Josef Sivic and Andrew Zisserman},
  journal={2008 IEEE Conference on Computer Vision and Pattern Recognition},
The state of the art in visual object retrieval from large databases is achieved by systems that are inspired by text retrieval. [] Key Method The set of visual words is obtained by selecting words based on proximity in descriptor space. We describe how this representation may be incorporated into a standard tf-idf architecture, and how spatial verification is modified in the case of this soft-assignment. We evaluate our method on the standard Oxford Buildings dataset, and introduce a new dataset for…

Figures and Tables from this paper

Improving image retrieval by introducing locality sensitive encoding to visual similarity measures
A probabilistic framework to model featureto-feature similarity for high-dimensional local features and a function to score the individual feature- to-feature contributions to an image-to-image similarity are presented, which consistently gives a significant boost to retrieval accuracy.
Dataset adaptation for the evaluation of small object retrieval in heavily cluttered background
  • D. Manger
  • Computer Science
    Commercial + Scientific Sensing and Imaging
  • 2018
This work proposes to adapt public datasets to generate novel evaluation setups yielding tasks that are closer to the problem of small object retrieval, and shows that the new evaluation setup allows focusing on specific characteristics such as the object size more easily during evaluation.
Advancing large scale object retrieval
It is shown that issuing multiple queries significantly improves recall and enables the system to find quite challenging occurrences of the queried object, and a method for automatically determining the title and sculptor of an imaged sculpture using the proposed smooth object retrieval system is described.
Improving Large-Scale Image Retrieval Through Robust Aggregation of Local Descriptors
  • S. Husain, M. Bober
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2017
This paper presents a novel method for deriving a compact and distinctive representation of image content called Robust Visual Descriptor with Whitening (RVD-W), which significantly advances the state of the art and delivers world-class performance.
Towards Large-scale Image Retrieval with a Disk-only Index
This paper argues that it is worth reconsidering image retrieval with local features because since then, large solid-state disks providing dramatically shorter access times, and more discriminative models enhancing the local features by encoding their spatial neighborhood using features from convolutional neural networks resulting in way fewer random read memory accesses are available.
Improving Bag-of-Features for Large Scale Image Search
A more precise representation based on Hamming embedding (HE) and weak geometric consistency constraints (WGC) is derived and this approach is shown to outperform the state-of-the-art on the three datasets.
Image Retrieval with Deep Local Features and Attention-based Keypoints
We introduce a local feature descriptor for large-scale image retrieval applications, called DELF (DEep Local Feature). The new feature is based on convolutional neural networks, which are trained
Memory Vectors for Particular Object Retrieval with Multiple Queries
An improved fusion of the object description based on the recent concept of generalized max-pooling and memory vectors, which summarizes a set of vectors by a single representative vector, which has the property of reducing the influence of frequent features.
All About VLAD
It is shown that a simple change to the normalization method significantly improves retrieval performance and vocabulary adaptation can substantially alleviate problems caused when images are added to the dataset after initial vocabulary learning.
Large scale partial-duplicate image retrieval with bi-space quantization and geometric consistency
This paper proposes a novel bispace quantization strategy that achieves an improvement in discriminative power and ambiguity of visual words in web image search by quantizing local features first in descriptor space and then in orientation space.


Object retrieval with large vocabularies and fast spatial matching
To improve query performance, this work adds an efficient spatial verification stage to re-rank the results returned from the bag-of-words model and shows that this consistently improves search quality, though by less of a margin when the visual vocabulary is large.
Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval
This paper brings query expansion into the visual domain via two novel contributions: strong spatial constraints between the query image and each result allow us to accurately verify each return, suppressing the false positives which typically ruin text-based query expansion.
A contextual dissimilarity measure for accurate and efficient image search
A contextual dissimilarity measure (CDM) takes into account the local distribution of the vectors and iteratively estimates distance correcting terms and is subsequently used to update an existing distance, thereby modifying the neighborhood structure.
Scalable Recognition with a Vocabulary Tree
  • D. Nistér, Henrik Stewénius
  • Computer Science
    2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)
  • 2006
A recognition scheme that scales efficiently to a large number of objects and allows a larger and more discriminatory vocabulary to be used efficiently is presented, which it is shown experimentally leads to a dramatic improvement in retrieval quality.
Video Google: a text retrieval approach to object matching in videos
We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint
Vector Quantizing Feature Space with a Regular Lattice
This paper examines the feasibility of a data- independent approach to construct such a visual vocabulary, where the feature space is discretized using a regular lattice, and introduces a class-specific feature selection step, which takes the spatial structure of SIFT-like descriptors into account.
Localisation using an image-map
A voting technique for a 3d structure and a run-time efficiency of kd-tree are improved to allow a quick finding of similar images and a quick localisation of a visual map using images from a typical digital camera.
Robust Wide Baseline Stereo from Maximally Stable Extremal Regions
The wide-baseline stereo problem, i.e. the problem of establishing correspondences between a pair of images taken from different viewpoints, is studied and an efficient and practically fast detection algorithm is presented for an affinely-invariant stable subset of extremal regions, the maximally stable extremal region (MSER).
Scale & Affine Invariant Interest Point Detectors
A comparative evaluation of different detectors is presented and it is shown that the proposed approach for detecting interest points invariant to scale and affine transformations provides better results than existing methods.
Distinctive Image Features from Scale-Invariant Keypoints
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are ...