• Corpus ID: 233025602

TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking

  title={TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking},
  author={Peng Chu and Jiang Wang and Quanzeng You and Haibin Ling and Zicheng Liu},
Tracking multiple objects in videos relies on modeling the spatial-temporal interactions of the objects. In this paper, we propose a solution named TransMOT, which leverages powerful graph transformers to efficiently model the spatial and temporal interactions among the objects. TransMOT effectively models the interactions of a large number of objects by arranging the trajectories of the tracked objects as a set of sparse weighted graphs, and constructing a spatial graph transformer encoder… 

Figures and Tables from this paper

Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking
A novel solution named TransSTAM, which leverages Transformer to effectively model both the appearance features of each object and the spatial-temporal relationships among objects, and achieves a clear performance improvement in both IDF1 and HOTA with respect to previous state-of-the-art approaches.
Fast Online and Relational Tracking
A novel interaction cue based on geometric features is presented aiming to detect occlusion and re-identify lost targets with low computational cost and achieves the state-of-the-art performance of MOT17 and comparable results on MOT20.
GTCaR: Graph Transformer for Camera Re-localization
A neural network approach with a graph Transformer backbone, namely GTCaR, to address the multi-view camera re-localization problem, which outperforms state-of-the-art approaches.
A template for the arxiv style
This paper introducesMASK (Multilevel Approximate Similarity search with k-means), an unconventional application of the k-Means algorithm as the foundation of a multilevel index structure for approximate similarity search, suitable for metric spaces.
Lightweight Indoor Multi-Object Tracking in Overlapping FOV Multi-Camera Environments
This paper addresses the cross-camera tracklet matching problem in scenarios with partially overlapping fields of view (FOVs) such as indoor multi-camera environments and uses a Kanade–Lucas–Tomasi algorithm-based frame-skipping method to reduce the computational overhead in object detection.
Towards Grand Unification of Object Tracking
For the first time, the great unification of the tracking network architecture and learning paradigm is accomplished, with Unicorn, a unified method that can simultaneously solve four tracking problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters.
Tracking Objects as Pixel-wise Distributions
P3AFormer yields 81.2% in terms of MOTA on the MOT17 benchmark – the first among all transformer networks to reach 80% MOTA in literature and adopts a meta-architecture to produce multi-scale object feature maps.
Fast Vehicle Detection and Tracking on Fisheye Traffic Monitoring Video using CNN and Bounding Box Propagation
A reliable car detection and tracking algorithm based on the concept of bounding box propagation among frames is designed, which provides 17.9 percentage points and 6.2 pp.
BoT-SORT: Robust Associations Multi-Pedestrian Tracking
A new robust state-of-the-art tracker, which can combine the advantages of motion and appearance information, along with camera-motion compensation, and a more accurate Kalman filter state vector is presented.
AI-Driven Cell Tracking to Enable High-Throughput Drug Screening Targeting Airway Epithelial Repair for Children with Asthma
The open-source and easy-to-use software, EPIC, is expected to enable high-throughput drug screening targeting airway epithelial repair for children with asthma and can be applied in other cellular contexts by outperforming the same software in the Cell Tracking with Mitosis Detection Challenge (CTMC) dataset.


TransTrack: Multiple-Object Tracking with Transformer
This work proposes TransTrack, a baseline for MOT with Transformer and introduces a set of learned object queries into the pipeline to enable detecting new-coming objects, and demonstrates a much simple and effective method based on query-key mechanism that could achieve competitive 65.8% MOTA on the MOT17 challenge dataset.
TrackFormer: Multi-Object Tracking with Transformers
TrackFormer is introduced, an end-to-end trainable MOT approach based on an encoder-decoder Transformer architecture that achieves data association between frames via attention by evolving a set of track predictions through a video sequence.
Tracking Objects as Points
Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that
Learning a Neural Solver for Multiple Object Tracking
This work exploits the classical network flow formulation of MOT to define a fully differentiable framework based on Message Passing Networks (MPNs) and shows that learning in MOT does not need to be restricted to feature extraction, but it can also be applied to the data association step.
Tracking Without Bells and Whistles
Overall, Tracktor yields superior tracking performance than any current tracking method and the analysis exposes remaining and unsolved tracking challenges to inspire future research directions.
MOT16: A Benchmark for Multi-Object Tracking
A new release of the MOTChallenge benchmark, which focuses on multiple people tracking, and offers a significant increase in the number of labeled boxes, but also provides multiple object classes beside pedestrians and the level of visibility for every single object of interest.
Towards a benchmark for multi-target tracking
  • MOTChallenge
  • 2015
Learning a Proposal Classifier for Multiple Object Tracking
A novel proposal-based learnable framework, which models MOT as a proposal generation, proposal scoring and trajectory inference paradigm on an affinity graph, and can solve the MOT problem in a data-driven way.
Rethinking the Competition Between Detection and ReID in Multiobject Tracking
A novel reciprocal network (REN) with a self-relation and cross-relation design so that to impel each branch to better learn task-dependent representations learning is proposed to alleviate the deleterious tasks competition and improve the cooperation between detection and ReID.
FGAGT: Flow-Guided Adaptive Graph Tracking
This article proposes the FGAGT tracker, which reaches the level of state-of-the-art, where the MOTA index exceeds FairMOT by 2.5 points, and CenterTrack by 8.4 points on the MOT17 dataset.