Learning Dynamic Compact Memory Embedding for Deformable Visual Object Tracking

  title={Learning Dynamic Compact Memory Embedding for Deformable Visual Object Tracking},
  author={Pengfei Zhu and Hongtao Yu and Kaihua Zhang and Yu Wang and Shuai Zhao and Lei Wang and Tianzhu Zhang and Qinghua Hu},
  journal={IEEE transactions on neural networks and learning systems},
  • Pengfei ZhuHongtao Yu Q. Hu
  • Published 23 November 2021
  • Computer Science
  • IEEE transactions on neural networks and learning systems
Recently, template-based trackers have become the leading tracking algorithms with promising performance in terms of efficiency and accuracy. However, the correlation operation between query feature and the given template only achieves accurate target localization, but is prone to state estimation error, especially when the target suffers from severe deformation. To address this issue, segmentation-based trackers are proposed that use per-pixel matching to improve the tracking performance of… 



Learning Dynamic Memory Networks for Object Tracking

A dynamic memory network to adapt the template to the target's appearance variations during tracking, which can be easily enlarged as the memory requirements of a task increase, which is favorable for memorizing long-term object information.

Discriminative Segmentation Tracking Using Dual Memory Banks

This work presents a novel discriminative segmentation tracking architecture equipped with dual memory banks, i.e., appearance memory bank and spatial memory bank, which outperforms the leading segmentation tracker D3S on two video object segmentation benchmarks DAVIS16 and DAVIS17.

D3S – A Discriminative Single Shot Segmentation Tracker

Without per-dataset finetuning and trained only for segmentation as the primary output, D3S outperforms all trackers on VOT2016, VOT2018 and GOT-10k benchmarks and performs close to the state-of-the-artTrackers on the TrackingNet.

Towards Accurate Pixel-wise Object Tracking by Attention Retrieval

An attention retrieval network (ARN) to perform soft spatial constraints on backbone features and introduces a multi-resolution multi-stage segmentation network (MMS) to further weaken the influence of background clutter by reusing the predicted mask to filter backbone features.

ATOM: Accurate Tracking by Overlap Maximization

This work proposes a novel tracking architecture, consisting of dedicated target estimation and classification components, and introduces a classification component that is trained online to guarantee high discriminative power in the presence of distractors.

Siamese Instance Search for Tracking

It turns out that the learned matching function is so powerful that a simple tracker built upon it, coined Siamese INstance search Tracker, SINT, suffices to reach state-of-the-art performance.

Learning Discriminative Model Prediction for Tracking

An end-to-end tracking architecture, capable of fully exploiting both target and background appearance information for target model prediction, derived from a discriminative learning loss by designing a dedicated optimization process that is capable of predicting a powerful model in only a few iterations.

Fast and Accurate Online Video Object Segmentation via Tracking Parts

This paper proposes a fast and accurate video object segmentation algorithm that can immediately start the segmentation process once receiving the images, and performs favorably against state-of-the-art algorithms in accuracy on the DAVIS benchmark dataset, while achieving much faster runtime performance.

High Performance Visual Tracking with Siamese Region Proposal Network

The Siamese region proposal network (Siamese-RPN) is proposed which is end-to-end trained off-line with large-scale image pairs for visual object tracking and consists of SiAMESe subnetwork for feature extraction and region proposal subnetwork including the classification branch and regression branch.

Deformable Siamese Attention Networks for Visual Object Tracking

This paper proposes SiamAttn, a new Siamese attention mechanism that computes deformable self-attention and cross-att attention, capable of aggregating rich contextual interdependencies between the target template and the search image, for more accurate tracking.