Robust Visual Tracking via Hierarchical Convolutional Features

  title={Robust Visual Tracking via Hierarchical Convolutional Features},
  author={Chao Ma and Jia-Bin Huang and Xiaokang Yang and Ming-Hsuan Yang},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
Visual tracking is challenging as target objects often undergo significant appearance changes caused by deformation, abrupt motion, background clutter and occlusion. [] Key Method Specifically, we learn adaptive correlation filters on the outputs from each convolutional layer to encode the target appearance. We infer the maximum response of each layer to locate targets in a coarse-to-fine manner.

Multiple Convolutional Features in Siamese Networks for Object Tracking

This paper argues that using a single convolutional layer as feature representation is not an optimal choice in a deep similarity framework, and presents a Multiple Features-Siamese Tracker (MFST), a novel tracking algorithm exploiting several hierarchical feature maps for robust tracking.

Adaptive Exploitation of Pre-trained Deep Convolutional Neural Networks for Robust Visual Tracking

Due to the automatic feature extraction procedure via multi-layer nonlinear transformations, the deep learning-based visual trackers have recently achieved great success in challenging scenarios for

Learning Compact Target-Oriented Feature Representations for Visual Tracking

A novel approach is proposed that takes both advantages of good generalization of generative models and excellent discrimination of discriminative models, for visual tracking, and clearly outperforms baseline trackers with a modest impact on the frame rate, and performs comparably against the state-of-the-art methods.

Improved Hierarchical Convolutional Features for Robust Visual Object Tracking

An improved hierarchical convolutional features model is proposed into a correlation filter framework for visual object tracking to improve the tracking performance and robustness and achieves competitive performance against state-of-the-art trackers.

Deep Collaborative Tracking Networks

The proposed deep collaborate tracking network (DCTN), a unified framework that jointly encodes both appearance and motion information for generic object tracking, is proposed, which is the first generalized framework to model motion and appearance information with deep learning for object tracking.

Visual Object Tracking based on Adaptive Siamese and Motion Estimation Network

Robust Visual Tracking via Hierarchical Particle Filter and Ensemble Deep Features

This paper proposes to elegantly exploit deep convolutional features with few particles in a novel hierarchical particle filter, which formulates correlation filters as observation models and breaks the standard particle filter framework down into two constituent particle layers, namely, particle translation layer and particle scale layer.

Deformable Object Tracking With Gated Fusion

This paper proposes a deformable convolution layer to enrich the target appearance representations in the tracking-by-detection framework and proposes a gated fusion scheme to control how the variations captured by the deformable Convolution affect the original appearance.

Visual Tracking via Deep Feature Fusion and Correlation Filters

A hybrid tracker combining the deep feature method and correlation filter to solve the problem of low tracking accuracy due to the limitations of traditional artificial feature models is built, and rich hiearchical features of Convolutional Neural Networks are used to make the multi-layer features fusion improve the tracker learning accuracy.



Robust Visual Tracking via Convolutional Networks Without Training

It is presented that, even without offline training with a large amount of auxiliary data, simple two-layer convolutional networks can be powerful enough to learn robust representations for visual tracking.

Visual Tracking with Fully Convolutional Networks

An in-depth study on the properties of CNN features offline pre-trained on massive image data and classification task on ImageNet shows that the proposed tacker outperforms the state-of-the-art significantly.

Video Tracking Using Learned Hierarchical Features

This paper offline learn features robust to diverse motion patterns from auxiliary video sequences via a two-layer convolutional neural network and proposes a domain adaptation module to online adapt the pre-learned features according to the specific target object.

Transferring Rich Feature Hierarchies for Robust Visual Tracking

This work pre-training a CNN offline and then transferring the rich feature hierarchies learned to online tracking, and proposes to generate a probability map instead of producing a simple class label to fit the characteristics of object tracking.

Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network

An online visual tracking algorithm by learning discriminative saliency map using Convolutional Neural Network using hidden layers of the network to improve target localization accuracy and achieve pixel-level target segmentation.

DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking

A model-free tracker that outperforms the existing state-of-the-art algorithms and rarely loses the track of the target object is proposed and a class-specific version of the proposed method that is tailored for tracking of a particular object class such as human faces is introduced.

Learning a Deep Compact Image Representation for Visual Tracking

Comparison with the state-of-the-art trackers on some challenging benchmark video sequences shows that the deep learning tracker is more accurate while maintaining low computational cost with real-time performance when the MATLAB implementation of the tracker is used with a modest graphics processing unit (GPU).

Learning Multi-domain Convolutional Neural Networks for Visual Tracking

A novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network using a large set of videos with tracking ground-truths to obtain a generic target representation.

Human Tracking Using Convolutional Neural Networks

This paper treats tracking as a learning problem of estimating the location and the scale of an object given its previous location, scale, as well as current and previous image frames, and introduces multiple path ways in CNN to better fuse local and global information.

DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers

An inverse cascade is built that, going from the final to the initial convolutional layers of the CNN, selects the most promising object locations and refines their boxes in a coarse-to-fine manner and is efficient.