Video Google: a text retrieval approach to object matching in videos

@article{Sivic2003VideoGA,
  title={Video Google: a text retrieval approach to object matching in videos},
  author={Josef Sivic and Andrew Zisserman},
  journal={Proceedings Ninth IEEE International Conference on Computer Vision},
  year={2003},
  pages={1470-1477 vol.2}
}
  • Josef Sivic, Andrew Zisserman
  • Published 13 October 2003
  • Computer Science
  • Proceedings Ninth IEEE International Conference on Computer Vision
We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors… 

Figures and Tables from this paper

Video Google: Efficient Visual Search of Videos

An approach to object retrieval which searches for and localizes all the occurrences of an object in a video, given a query image of the object, and returns a ranked list of shots in the manner of Google.

Efficient object retrieval from videos

We describe an approach to video object retrieval which enables all shots containing the object to be returned in a manner, and with a speed, similar to a Google search for text. The object is

Efficient Visual Search of Videos Cast as Text Retrieval

An approach to object retrieval which searches for and localizes all the occurrences of an object in a video, given a query image of the object, and investigates retrieval performance with respect to different quantizations of region descriptors and compares the performance of several ranking measures.

Efficient Visual Content Retrieval and Mining in Videos

An image representation for objects and scenes consisting of a configuration of viewpoint covariant regions and their descriptors enables recognition to proceed successfully despite changes in scale, viewpoint, illumination and partial occlusion is described.

Efficient Visual Search for Objects in Videos

An approach to generalize the concept of text-based search to nontextual information and describes the possibilities of retrieving objects or scenes in a movie with the ease, speed, and accuracy with which Google retrieves web pages containing particular words by specifying the query as an image of the object or scene.

Efficient Visual Search for Objects in Videos Visual search using text-retrieval methods can rapidly and accurately locate objects in videos despite changes in camera viewpoint, lighting, and partial occlusions.

Three research directions for the presented video retrieval approach are discussed: 1) building visual vocabularies for very large-scale retrieval; 2) retrieval of 3-D objects; and 3) more thorough verification and ranking using the spatial structure of objects.

Image Retrieval Using Textual Cues

An approach for the text-to-image retrieval problem based on textual content present in images, where the retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M.

Relevance Assessment for Visual Video Re-ranking

A visual model from the images is built and used to rank the videos by relevance to the object of interest based on local image features, which builds on wide baseline stereo matching.

Visual words based spatiotemporal sequence matching in video copy detection

A novel content-based copy retrieval scheme for video copy identification that takes into account spatial and temporal distances between a query clip and the ones stored in the database of the legal holders of the videos.

Video-based image retrieval

A novel QBE-based image retrieval system where users are allowed to submit a short video clip as a query to improve the retrieval reliability and improve the relevance of the retrieved results is proposed.
...

References

SHOWING 1-10 OF 20 REFERENCES

Automated Scene Matching in Movies

It is demonstrated that wide baseline matching techniques can be successfully employed for this task by matching key frames between shots by representing each frame by a set of viewpoint invariant local feature vectors.

Object recognition from local scale-invariant features

  • D. Lowe
  • Computer Science
    Proceedings of the Seventh IEEE International Conference on Computer Vision
  • 1999
Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

A performance evaluation of local descriptors

It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

Object Recognition using Local Affine Frames on Distinguished Regions

A novel approach to appearance based object recognition based on matching of local image features, reliably recognises objects under very different viewing conditions that is invariant to piecewise-affine image deformations, but still remains very discriminative.

Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions

This work presents an alternative method for extracting invariant regions that does not depend on the presence of edges or corners in the image but is purely intensity-based, and demonstrates the use of such regions for another application, which is wide baseline stereo matching.

Reliable feature matching across widely separated views

  • A. Baumberg
  • Computer Science
    Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662)
  • 2000
A robust method for automatically matching features in images corresponding to the same physical point on an object seen from two arbitrary viewpoints that is optimised for a structure-from-motion application where it wishes to ignore unreliable matches at the expense of reducing the number of feature matches.

The Truth about Corel - Evaluation in Image Retrieval

This article compares different ways of evaluating the performance of content-based image retrieval systems using a subset of the Corel images with the same CBIRSan d the same set of evaluation measures to show how easy it is to get differing results, even when using the same image collection, thesame CBIRS and the same performance measures.

Local feature view clustering for 3D object recognition

  • D. Lowe
  • Computer Science
    Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001
  • 2001
This paper presents a method for combining multiple images of a 3D object into a single model representation that provides for recognition of 3D objects from any viewpoint, the generalization of models to non-rigid changes, and improved robustness through the combination of features acquired under a range of imaging conditions.

Combining Appearance and Topology for Wide Baseline Matching

This paper incorporates topological constraints into an existing matching algorithm which matches image intensity profiles between interest points, and shows that the algorithm can be improved by exploiting the constraint that the intensity profiles around each interest point should be cyclically ordered.