Transformer-based assignment decision network for multiple object tracking

  title={Transformer-based assignment decision network for multiple object tracking},
  author={Athena Psalta and Vasileios Tsironis and Konstantinos Karantzalos},
Data association is a crucial component for any multiple object tracking (MOT) method that follows the tracking-by-detection paradigm. To generate complete trajectories such methods employ a data association process to establish assignments between detections and existing targets during each timestep. Recent data association approaches try to either solve a multi-dimensional linear assignment task or a network flow minimization problem or tackle it via multiple hypotheses tracking. However… 
1 Citations

Figures and Tables from this paper

How to Backpropagate through Hungarian in Your DETR?

It is shown that the global loss can be expressed as the sum of an Assignment-independent term, and an assignment-dependent term which can be used to define the assignment cost matrix, and backpropagation is carried out properly.



How to Train Your Deep Multi-Object Tracker

A differentiable proxy of MOTA and MOTP is proposed, which is combined in a loss function suitable for end-to-end training of deep multi-object trackers and establishes a new state of the art on the MOTChallenge benchmark.

Online Multi-Object Tracking Using Joint Domain Information in Traffic Scenarios

A novel tracking method that solves the problem of visual tracking of multiple objects by put together information from both enlarged structural and temporal domain by putting together the heterogeneous domain information, which exhibits an improved state-of-the-art performance on standard benchmarks.

Tracking Without Bells and Whistles

Overall, Tracktor yields superior tracking performance than any current tracking method and the analysis exposes remaining and unsolved tracking challenges to inspire future research directions.

Multi-view 3D Object Detection Network for Autonomous Driving

This paper proposes Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes and designs a deep fusion scheme to combine region-wise features from multiple views and enable interactions between intermediate layers of different paths.

MOT16: A Benchmark for Multi-Object Tracking

A new release of the MOTChallenge benchmark, which focuses on multiple people tracking, and offers a significant increase in the number of labeled boxes, but also provides multiple object classes beside pedestrians and the level of visibility for every single object of interest.

Target Identity-aware Network Flow for online multiple target tracking

It is shown that automatically detecting and tracking targets in a single framework can help resolve the ambiguities due to frequent occlusion and heavy articulation of targets.

MOTChallenge: A Benchmark for Single-camera Multiple Target Tracking

This paper collects the first three releases of the MOTChallenge and provides a categorization of state-of-the-art trackers and a broad error analysis, to help newcomers understand the related work and research trends in the MOT community, and hopefully shed some light into potential future research directions.

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

A simple, effective and generic association method, tracking by associating almost every detection box instead of only the high score ones, named ByteTrack, which achieves state-of-the-art performance on MOT20, HiEve and BDD100K tracking benchmarks.

MOTR: End-to-End Multiple-Object Tracking with TRansformer

MOTR is proposed, which extends DETR and introduces “track query” to model the tracked instances in the entire video to enhance temporal relation modeling and serve as a stronger baseline for future research on temporal modeling and Transformer-based trackers.

TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking

A solution named TransMOT, which leverages powerful graph transformers to efficiently model the spatial and temporal interactions among the objects, and achieves state-of-the-art performance on all the datasets.