SANet: Structure-Aware Network for Visual Tracking

  title={SANet: Structure-Aware Network for Visual Tracking},
  author={Heng Fan and Haibin Ling},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  • Heng Fan, Haibin Ling
  • Published 21 November 2016
  • Computer Science
  • 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Convolutional neural network (CNN) has drawn increasing interest in visual tracking owing to its powerfulness in feature extraction. Most existing CNN-based trackers treat tracking as a classification problem. However, these trackers are sensitive to similar distractors because their CNN models mainly focus on inter-class classification. To address this problem, we use self-structure information of object to distinguish it from distractors. Specifically, we utilize recurrent neural network (RNN… 

Figures and Tables from this paper

Object-Adaptive LSTM Network for Visual Tracking
This paper proposes a novel object-adaptive LSTM network, which can effectively exploit sequence dependencies and dynamically adapt to the temporal object variations via constructing an intrinsic model for object appearance and motion and develops an efficient strategy for proposal selection.
Visual Tracking with Attentional Convolutional Siamese Networks
A novel Attentional Convolutional Siamese Networks for visual tracking (ACST), to improve the classical AlexNet by fusing spatial and channel attentions during feature learning and a response-based weighted sampling strategy during training is proposed to strengthen the discrimination power to distinguish two objects with the similar attributes.
Learning Spatial-Channel Attention for Visual Tracking
This work leverages spatial attention and channel attention to enhance features of objects without much extra computational cost and proposes inter-instance loss to make the tracker be aware of not only target-background classification but also instances classification across multi-domains.
Recurrent Filter Learning for Visual Tracking
This paper directly feed the target's image patch to a recurrent neural network (RNN) to estimate an object-specific filter for tracking, and extends the matrix multiplications of the fully-connected layers of the RNN to a convolution operation on feature maps, which preserves thetarget's spatial structure and also is memory-efficient.
Visual Object Tracking via Graph Convolutional Representation
This work employs a GCN module to learn structural features for visual tracking, and utilizes a dual path network to extract heterogeneous features and uses the attention mechanism to adaptively select features.
Multi-branch Siamese Network for High Performance Online Visual Tracking
This work proposes a Multi-branch Siamese network (MSiam) for high-performance object tracking, which performs layer-wise feature aggregations and simultaneously considers the global-local patterns for more accurate target tracking and proposes a feature aggregation module (FAM) keeping the heterogeneity of the three types of features.
Auto-Selecting Receptive Field Network for Visual Tracking
An Auto-Selecting Receptive Field Network (ASRF) is proposed to select receptive field information and effective clues dynamically and is designed to adaptively adjust receptive field size for each neuron according to multiple scales of input information.


Visual Tracking with Fully Convolutional Networks
An in-depth study on the properties of CNN features offline pre-trained on massive image data and classification task on ImageNet shows that the proposed tacker outperforms the state-of-the-art significantly.
Human Tracking Using Convolutional Neural Networks
This paper treats tracking as a learning problem of estimating the location and the scale of an object given its previous location, scale, as well as current and previous image frames, and introduces multiple path ways in CNN to better fuse local and global information.
Learning Multi-domain Convolutional Neural Networks for Visual Tracking
A novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network using a large set of videos with tracking ground-truths to obtain a generic target representation.
Hierarchical Convolutional Features for Visual Tracking
This paper adaptively learn correlation filters on each convolutional layer to encode the target appearance and hierarchically infer the maximum response of each layer to locate targets.
Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network
An online visual tracking algorithm by learning discriminative saliency map using Convolutional Neural Network using hidden layers of the network to improve target localization accuracy and achieve pixel-level target segmentation.
Saliency Detection with Recurrent Fully Convolutional Networks
This paper develops a new saliency model using recurrent fully convolutional networks (RFCNs) that is able to incorporate saliency prior knowledge for more accurate inference and enables the network to capture generic representations of objects for saliency detection.
Hedged Deep Tracking
A novel CNN based tracking framework is proposed, which takes full advantage of features from different CNN layers and uses an adaptive Hedge method to hedge several CNN based trackers into a single stronger one.
Learning a Deep Compact Image Representation for Visual Tracking
Comparison with the state-of-the-art trackers on some challenging benchmark video sequences shows that the deep learning tracker is more accurate while maintaining low computational cost with real-time performance when the MATLAB implementation of the tracker is used with a modest graphics processing unit (GPU).
A Siamese Long Short-Term Memory Architecture for Human Re-identification
A novel siamese Long Short-Term Memory (LSTM) architecture that can process image regions sequentially and enhance the discriminative capability of local feature representation by leveraging contextual information.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.