• Publications
  • Influence
Video Google: a text retrieval approach to object matching in videos
We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpointExpand
Object retrieval with large vocabularies and fast spatial matching
TLDR
To improve query performance, this work adds an efficient spatial verification stage to re-rank the results returned from the bag-of-words model and shows that this consistently improves search quality, though by less of a margin when the visual vocabulary is large. Expand
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition
TLDR
A convolutional neural network architecture that is trainable in an end-to-end manner directly for the place recognition task and an efficient training procedure which can be applied on very large-scale weakly labelled tasks are developed. Expand
Lost in quantization: Improving particular object retrieval in large scale image databases
The state of the art in visual object retrieval from large databases is achieved by systems that are inspired by text retrieval. A key component of these approaches is that local regions of imagesExpand
Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks
TLDR
This work designs a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset, and shows that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification. Expand
Is object localization for free? - Weakly-supervised learning with convolutional neural networks
TLDR
A weakly supervised convolutional neural network is described for object classification that relies only on image-level labels, yet can learn from cluttered scenes containing multiple objects. Expand
SIFT Flow: Dense Correspondence across Different Scenes
TLDR
A method to align an image to its neighbors in a large image collection consisting of a variety of scenes, and applies the SIFT flow algorithm to two applications: motion field prediction from a single static image and motion synthesis via transfer of moving objects. Expand
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition
TLDR
A convolutional neural network architecture that is trainable in an end-to-end manner directly for the place recognition task, and significantly outperforms non-learnt image representations and off-the-shelf CNN descriptors on two challenging place recognition benchmarks. Expand
Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval
TLDR
This paper brings query expansion into the visual domain via two novel contributions: strong spatial constraints between the query image and each result allow us to accurately verify each return, suppressing the false positives which typically ruin text-based query expansion. Expand
Localizing Moments in Video with Natural Language
TLDR
The Moment Context Network (MCN) is proposed which effectively localizes natural language queries in videos by integrating local and global video features over time and outperforms several baseline methods. Expand
...
1
2
3
4
5
...