Real-Time MDNet

@inproceedings{Jung2018RealTimeM,
  title={Real-Time MDNet},
  author={Ilchae Jung and Jeany Son and Mooyeol Baek and Bohyung Han},
  booktitle={ECCV},
  year={2018}
}
We present a fast and accurate visual tracking algorithm based on the multi-domain convolutional neural network (MDNet. [] Key Method We also introduce a novel loss term to differentiate foreground instances across multiple domains and learn a more discriminative embedding of target objects with similar semantics. The proposed techniques are integrated into the pipeline of a well known CNN-based visual tracking algorithm, MDNet.

Feature selection accelerated convolutional neural networks for visual tracking

TLDR
A time-efficient and accurate tracking scheme, a feature selection accelerated CNN (FSNet) tracking solution based on MDNet (Multi-Domain Network), which achieves a speedup to 60 FPS on the GPU compared with the original MDNet, which functioned at 1 FPS with a very low impact on tracking accuracy.

Real-time Object Tracking Based on Improved Adversarial Learning

TLDR
An improved tracking model based on adversarial learning and to accelerate feature extraction is proposed and a Precise ROI Pooling (PrROIPooling) based algorithm for extracting more accurate representations of targets is presented.

Multi-domain Collaborative Feature Representation for Robust Visual Object Tracking

TLDR
This paper proposes Common Features Extractor (CFE) to learn potential common representations from the RGB domain and event domain and utilizes a Unique Extractor for Event (UEE) based on Spiking Neural Networks to extract edge cues in the event domain.

HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking

TLDR
A novel high-resolution Siamese network is proposed, which connects the high-to-low resolution convolution streams in parallel as well as repeatedly exchanges the information across resolutions to maintain high- resolution representations.

Learning Spatio-Temporal Transformer for Visual Tracking

TLDR
A new tracking architecture with an encoder-decoder transformer as the key component, which models the global spatio-temporal feature dependencies between target objects and search regions, while the decoder learns a query embedding to predict the spatial positions of the target objects.

Fully convolutional adaptive tracker with real time performance

TLDR
The effectiveness of the proposed FCAT performs competitively with state-of-the-art visual trackers while maintaining real-time tracking speeds of over 30 frames per second and was illustrated on surveillance style videos.

Deep Flow Collaborative Network for Online Visual Tracking

TLDR
A deep flow collaborative network is designed, which executes the expensive feature network only on sparse keyframes and transfers the feature maps to other frames via optical flow and raises an effective adaptive keyframe scheduling mechanism to select the most appropriate keyframe.

Unsupervised Deep Representation Learning for Real-Time Tracking

TLDR
This work proposes an unsupervised learning method for visual tracking that achieves the baseline accuracy of classic fully supervised trackers while achieving a real-time speed and exhibits a potential in leveraging more unlabeled or weakly labeled data to further improve the tracking accuracy.

Siam R-CNN: Visual Tracking by Re-Detection

TLDR
This work presents Siam R-CNN, a Siamese re-detection architecture which unleashes the full power of two-stage object detection approaches for visual object tracking, and combines this with a novel tracklet-based dynamic programming algorithm to model the full history of both the object to be tracked and potential distractor objects.

Dual Siamese network for RGBT tracking via fusing predicted position maps

TLDR
This work proposes a response-level fusion tracking algorithm that employed deep learning and has very good performance and runs at 116 frames per second, which far exceeds the real-time requirement of 25 frames perSecond.
...

References

SHOWING 1-10 OF 35 REFERENCES

Visual Tracking with Fully Convolutional Networks

TLDR
An in-depth study on the properties of CNN features offline pre-trained on massive image data and classification task on ImageNet shows that the proposed tacker outperforms the state-of-the-art significantly.

Learning Multi-domain Convolutional Neural Networks for Visual Tracking

TLDR
A novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network using a large set of videos with tracking ground-truths to obtain a generic target representation.

Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network

TLDR
An online visual tracking algorithm by learning discriminative saliency map using Convolutional Neural Network using hidden layers of the network to improve target localization accuracy and achieve pixel-level target segmentation.

Modeling and Propagating CNNs in a Tree Structure for Visual Tracking

TLDR
An online visual tracking algorithm by managing multiple target appearance models in a tree structure using Convolutional Neural Networks to represent target appearances, where multiple CNNs collaborate to estimate target states and determine the desirable paths for online model updates in the tree.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

Robust Object Tracking Based on Temporal and Spatial Deep Networks

TLDR
A new deep architecture which incorporates the temporal and spatial information to boost the tracking performance is presented, and competing performance of the proposed tracker over a number of state-of-the-art algorithms is demonstrated.

Fully-Convolutional Siamese Networks for Object Tracking

TLDR
A basic tracking algorithm is equipped with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video and achieves state-of-the-art performance in multiple benchmarks.

Good Features to Correlate for Visual Tracking

TLDR
Extensive performance analysis shows the efficacy of the proposed custom design in the CFB tracking framework, fine-tuning the convolutional parts of a state-of-the-art network and integrating this model to a CFB tracker, which is the top performing one of VOT2016, 18% increase is achieved in terms of expected average overlap.

SANet: Structure-Aware Network for Visual Tracking

  • Heng FanHaibin Ling
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2017
TLDR
This work utilizes recurrent neural network (RNN) to model object structure, and incorporates it into CNN to improve its robustness to similar distractors and shows that the proposed algorithm outperforms other methods.

CREST: Convolutional Residual Learning for Visual Tracking

TLDR
This paper proposes the CREST algorithm to reformulate DCFs as a one-layer convolutional neural network, and applies residual learning to take appearance changes into account to reduce model degradation during online update.