Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers

  title={Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers},
  author={Zhen He and Jian Li and Daxue Liu and Hangen He and David Barber},
  journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Zhen He, Jian Li, D. Barber
  • Published 10 September 2018
  • Computer Science
  • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Online Multi-Object Tracking (MOT) from videos is a challenging computer vision task which has been extensively studied for decades. Most of the existing MOT algorithms are based on the Tracking-by-Detection (TBD) paradigm combined with popular machine learning approaches which largely reduce the human effort to tune algorithm parameters. However, the commonly used supervised learning approaches require the labeled data (e.g., bounding boxes), which is expensive for videos. Also, the TBD… 

OneShotDA: Online Multi-Object Tracker With One-Shot-Learning-Based Data Association

This study applies a one-shot learning framework based on an attention mechanism to the multi-object tracking problem and reveals that the results reveal that the performance of the proposed method was comparable with those of current state-of-the-art methods.

SynDHN: Multi-Object Fish Tracker Trained on Synthetic Underwater Videos

  • M. A. MartijaP. Naval
  • Computer Science
    2020 25th International Conference on Pattern Recognition (ICPR)
  • 2021
This paper uses the Deep Hungarian Network (DHN) to repurpose DHN to become the tracking component of the algorithm by performing the task of affinity estimation between detector predictions, and considers both spatial and appearance features for affinity estimation.

Multiple Object Tracking in Deep Learning Approaches: A Survey

This paper focuses on giving a thorough review of the evolution of MOT in recent decades, investigating the recent advances in MOT, and showing some potential directions for future work.

Robust Unsupervised Multi-Object Tracking In Noisy Environments

AttU-Net shows better unsupervised MOT tracking performance over variational inference-based state-of-the-art baselines and the proposed single-head attention model helps limit the negative impact of noise by learning visual representations at different segment scales.

Benchmarking Unsupervised Object Representations for Video Sequences

A benchmark to compare the perceptual abilities of four object-centric approaches and suggests that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking than the spatial transformer based architectures.

Unmasking the Inductive Biases of Unsupervised Object Representations for Video Sequences

It is argued that the established evaluation protocol of multi-object tracking tests precisely these perceptual qualities and a new benchmark dataset based on procedurally generated video sequences are needed and proposed, suggesting that this synthetic video benchmark may provide fruitful guidance towards learning more robust object-centric video representations.

Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking

An architecture that scales well to the large-scene, many-object setting by employing spatially invariant computations (convolutions and spatial attention) and representations (a spatially local object specification scheme) is proposed.

Recent Advances in Embedding Methods for Multi-Object Tracking: A Survey

This survey conducts a comprehensive overview with in-depth analysis for embedding methods in MOT from seven different perspectives, including patch-level embedding, single-frameembedding, cross-frame joint embeddedding, correlation embedding and sequential embedding; and cross-track relational embedding.

Unsupervised Multiple-Object Tracking with a Dynamical Variational Autoencoder

An unsupervised probabilistic model and associated estimation algorithm for multi-object tracking (MOT) based on a dynamical variational autoencoder (DVAE), called DVAE-UMOT, which is shown experimentally to compete well with and even surpass the performance of two state-of-the-art Probabilistic MOT models.

AutoTrajectory: Label-free Trajectory Extraction and Prediction from Videos using Dynamic Points

This paper presents a novel, label-free algorithm, AutoTrajectory, for trajectory extraction and prediction to use raw videos directly, and is believed to be the first to achieve unsupervised learning of trajectory extracted and prediction.



Learning to Track: Online Multi-object Tracking by Decision Making

This work forms the online MOT problem as decision making in Markov Decision Processes (MDPs), where the lifetime of an object is modeled with a MDP, and a similarity function for data association is equivalent to learning a policy for the MDP.

Visual Tracking with Fully Convolutional Networks

An in-depth study on the properties of CNN features offline pre-trained on massive image data and classification task on ImageNet shows that the proposed tacker outperforms the state-of-the-art significantly.

Non-Markovian Globally Consistent Multi-object Tracking

This paper proposes a non-Markovian approach to imposing global consistency by using behavioral patterns to guide the tracking algorithm, and shows significant improvements both in supervised settings where ground truth is available and behavioral patterns can be learned from it, and in completely unsupervised settings.

Online Multiperson Tracking-by-Detection from a Single, Uncalibrated Camera

This paper proposes a novel approach for multiperson tracking-by-detection in a particle filtering framework that detects and tracks a large number of dynamically moving people in complex scenes with occlusions, requires no camera or ground plane calibration, and only makes use of information from the past.

Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors

  • Bo WuR. Nevatia
  • Computer Science
    International Journal of Computer Vision
  • 2006
This work presents an approach to automatically detect and track multiple, possibly partially occluded humans in a walking or standing pose from a single camera, which may be stationary or moving.

Multi-target tracking by on-line learned discriminative appearance models

OLDAMs have significantly higher discrimination between different targets than conventional holistic color histograms, and when integrated into a hierarchical association framework, they help improve the tracking accuracy, particularly reducing the false alarms and identity switches.

People-tracking-by-detection and people-detection-by-tracking

This paper combines the advantages of both detection and tracking in a single framework using a hierarchical Gaussian process latent variable model (hGPLVM) and presents experimental results that demonstrate how this allows to detect and track multiple people in cluttered scenes with reoccurring occlusions.

Multi-Object Tracking Through Simultaneous Long Occlusions and Split-Merge Conditions

This paper shows how to efficiently handle splitting and merging during track linking, and shows that the identities of objects that merge together and subsequently split can be maintained, which enables the identity of objects to be maintained throughout long sequences with difficult conditions.

Multiple Object Tracking Using K-Shortest Paths Optimization

This paper shows that reformulating that step as a constrained flow optimization results in a convex problem and takes advantage of its particular structure to solve it using the k-shortest paths algorithm, which is very fast.

Robust Object Tracking by Hierarchical Association of Detection Responses

This work presents a detection-based three-level hierarchical association approach to robustly track multiple objects in crowded environments from a single camera and shows a great improvement in performance compared to previous methods.