NetVLAD: CNN Architecture for Weakly Supervised Place Recognition
- R. Arandjelović, Petr Gronát, A. Torii, T. Pajdla, Josef Sivic
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine…
- 23 November 2015
A convolutional neural network architecture that is trainable in an end-to-end manner directly for the place recognition task and an efficient training procedure which can be applied on very large-scale weakly labelled tasks are developed.
Three things everyone should know to improve object retrieval
- R. Arandjelović, Andrew Zisserman
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 16 June 2012
A new method to compare SIFT descriptors (RootSIFT) which yields superior performance without increasing processing or storage requirements, and a novel method for query expansion where a richer model for the query is learnt discriminatively in a form suited to immediate retrieval through efficient use of the inverted index.
All About VLAD
- R. Arandjelović, Andrew Zisserman
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 1 June 2013
It is shown that a simple change to the normalization method significantly improves retrieval performance and vocabulary adaptation can substantially alleviate problems caused when images are added to the dataset after initial vocabulary learning.
On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models
- Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli
- Computer ScienceArXiv
- 30 October 2018
This work shows how a simple bounding technique, interval bound propagation (IBP), can be exploited to train large provably robust neural networks that beat the state-of-the-art in verified accuracy and allows the largest model to be verified beyond vacuous bounds on a downscaled version of ImageNet.
Convolutional Neural Network Architecture for Geometric Matching
- Ignacio Rocco, R. Arandjelović, Josef Sivic
- Computer ScienceComputer Vision and Pattern Recognition
- 16 March 2017
This work proposes a convolutional neural network architecture for geometric matching based on three main components that mimic the standard steps of feature extraction, matching and simultaneous inlier detection and model parameter estimation, while being trainable end-to-end.
24/7 Place Recognition by View Synthesis
- A. Torii, R. Arandjelović, Josef Sivic, M. Okutomi, T. Pajdla
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine…
- 7 June 2015
A new place recognition approach is developed that combines an efficient synthesis of novel views with a compact indexable image representation and significantly outperforms other large-scale place recognition techniques on this challenging data.
Look, Listen and Learn
- R. Arandjelović, Andrew Zisserman
- Computer ScienceIEEE International Conference on Computer Vision
- 23 May 2017
There is a valuable, but so far untapped, source of information contained in the video itself – the correspondence between the visual and the audio streams, and a novel “Audio-Visual Correspondence” learning task that makes use of this.
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition
- R. Arandjelović, Petr Gronát, A. Torii, T. Pajdla, Josef Sivic
- Computer ScienceComputer Vision and Pattern Recognition
- 23 November 2015
A convolutional neural network architecture that is trainable in an end-to-end manner directly for the place recognition task, and significantly outperforms non-learnt image representations and off-the-shelf CNN descriptors on two challenging place recognition benchmarks.
Neighbourhood Consensus Networks
- Ignacio Rocco, Mircea Cimpoi, R. Arandjelović, A. Torii, T. Pajdla, Josef Sivic
- Computer ScienceNeural Information Processing Systems
- 24 October 2018
An end-to-end trainable convolutional neural network architecture that identifies sets of spatially consistent matches by analyzing neighbourhood consensus patterns in the 4D space of all possible correspondences between a pair of images without the need for a global geometric model is developed.
Objects that Sound
- R. Arandjelović, Andrew Zisserman
- Computer ScienceEuropean Conference on Computer Vision
- 18 December 2017
New network architectures are designed that can be trained using the AVC task for these functionalities: for cross-modal retrieval, and for localizing the source of a sound in an image.
...
...