Luca Bertinetto

Learn More
The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object's appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inherently limits the richness of the model they can learn. Recently, several attempts have been made to(More)
Correlation Filter-based trackers have recently achieved excellent performance, showing great robustness to challenging situations exhibiting motion blur and illumination changes. However, since the model that they learn depends strongly on the spatial layout of the tracked object, they are notoriously sensitive to deformation. Models based on colour(More)
The Visual Object Tracking challenge VOT2016 aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 70 trackers are presented, with a large number of trackers being published at major computer vision conferences and journals in the recent years. The number of tested state-of-the-art(More)
One-shot learning is usually tackled by using generative models or discriminative embeddings. Discriminative methods based on deep learning, which are very effective in other learning scenarios, are ill-suited for one-shot learning as they need large amounts of training data. In this paper, we propose a method to learn the parameters of a deep model in one(More)
The Thermal Infrared Visual Object Tracking challenge 2016, VOT-TIR2016, aims at comparing short-term single-object visual track-ers that work on thermal infrared (TIR) sequences and do not apply pre-learned models of object appearance. VOT-TIR2016 is the second benchmark on short-term tracking in TIR sequences. Results of 24 track-ers are presented. For(More)
Test image: 255x255x3 17x17x32 49x49x32 Correlation Filter Crop ★ 33x33x1 CNN CNN 49x49x32 Figure 1: Overview of the proposed network architecture, CFNet. It is an asymmetric Siamese network: after applying the same convolutional feature transform to both input images, the " training image " is used to learn a linear template, which is then applied to(More)
This paper addresses the problem of retrieving those shots from a database of video sequences that match a query image. Existing architectures match the images using a high-level representation of local features extracted from the video database, and are mainly based on Bag ofWords model. Such architectures lack however the capability to scale up to very(More)
  • 1