ByteTrack: Multi-Object Tracking by Associating Every Detection Box

@inproceedings{Zhang2021ByteTrackMT,
  title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
  author={Yifu Zhang and Pei Sun and Yi Jiang and Dongdong Yu and Zehuan Yuan and Ping Luo and Wenyu Liu and Xinggang Wang},
  booktitle={European Conference on Computer Vision},
  year={2021}
}
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos. Most methods obtain identities by associating detection boxes whose scores are higher than a threshold. The objects with low detection scores, e.g . occluded objects, are simply thrown away, which brings non-negligible true object missing and fragmented trajectories. To solve this problem, we present a simple, effective and generic association method, tracking by associating almost every detection… 

ColorByte: A real time MOT method using fast appearance feature based on ByteTrack

A fast appearance feature is designed, which is a simple but relatively accurate method, to substitute cumbersome Re-ID component in MOT methods and ByteTrack is the new SOTA association algorithm in MOT benchmarks which introduces an extra association on objects with low score.

Understanding Ethics, Privacy, and Regulations in Smart Video Surveillance for Public Safety

It is argued that ethical and privacy concerns could be addressed through four lenses: algorithm, system, model, and data.

High-fidelity ship imaging trajectory extraction via an instance segmentation model

This work proposes a framework that combines instance segmentation and the improved Bytetrack multi-target tracking algorithm, which first obtains the exact position and outline of the ship, and secondly completes the ship target tracking according to ByTetrack, aiming to solve the problem of ship occlusion in video ship tracking.

ReFace: Improving Clothes-Changing Re-Identification With Face Features

This work introduces a new method that takes full advantage of the ability of existing ReID models to extract appearance-related features and combines it with a face feature extraction model to achieve new state-of-the-art results, both on image-based and video-based benchmarks.

Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking?

This paper shows that even a small yet diverse set of trajectory predictions for moving agents will significantly reduce this search space and thus improve long-term tracking robustness and advance state-of-the-art trackers on the MOTChallenge dataset.

QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking

Quasi-Dense Similarity Learning is presented, which densely samples hundreds of object regions on a pair of images for contrastive learning and which rivals the performance of state-of-the-art tracking methods on all benchmarks and sets a new state of theart on the large-scale BDD100K MOT benchmark, while introducing negligible computational overhead to the detector.

Detection Recovery in Online Multi-Object Tracking with Sparse Graph Tracker

SGT converts video data into a graph where detections, their connections, and the relational features of two connected nodes are represented by nodes, edges, and edge features, respectively, which allow SGT to track targets with tracking candidates selected by top- K scored detections with large K.

Towards Rich, Portable, and Large-Scale Pedestrian Data Collection

A data collection system that is portable, which facilitates accessible large-scale data collection in diverse environments and couple the system with a semi-autonomous labeling pipeline for fast trajectory label production is proposed.

A Closer Look at the Joint Training of Object Detection and Re-Identification in Multi-Object Tracking

This work proposes Identity-aware Label Assignment, which jointly considers the assignment cost of detection and ReID to select positive samples for each instance without ambiguities, and advances a novel Discriminative Focal loss that integrates ReID predictions with Focal Loss to focus the training on the discriminative samples.

MOT-H: A Multi-Target Tracking Dataset Based on Horizontal View

MOT-H is meticulously annotated on crowded scenes from the horizontal view, with the primary goal of proving anti-jamming performance against complicated occlusions or even complete occlusion.
...

References

SHOWING 1-10 OF 104 REFERENCES

Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking

The two major novelties: chained structure and paired attentive regression, make CTracker simple, fast and effective, setting new MOTA records on MOT16 and MOT17 challenge datasets (67.6 and 66.6, respectively), without relying on any extra training data.

MOT16: A Benchmark for Multi-Object Tracking

A new release of the MOTChallenge benchmark, which focuses on multiple people tracking, and offers a significant increase in the number of labeled boxes, but also provides multiple object classes beside pedestrians and the level of visibility for every single object of interest.

Simple online and realtime tracking

Despite only using a rudimentary combination of familiar techniques such as the Kalman Filter and Hungarian algorithm for the tracking components, this approach achieves an accuracy comparable to state-of-the-art online trackers.

Improving Multiple Pedestrian Tracking by Track Management and Occlusion Handling

This work proposes a novel occlusion handling strategy that explicitly models the relation between occluding and occluded tracks outperforming the feature-based approach, while not depending on a separate re-identification network.

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

This work presents a novel MOT evaluation metric, higher order tracking accuracy (HOTA), which explicitly balances the effect of performing accurate detection, association and localization into a single unified metric for comparing trackers.

YOLOv3: An Incremental Improvement

We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more

Towards Grand Unification of Object Tracking

For the first time, the great unification of the tracking network architecture and learning paradigm is accomplished, with Unicorn, a unified method that can simultaneously solve four tracking problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

The proposed detector, named M IM D ET, enables a MIM pre-trained vanilla ViT to outperform hierarchical Swin Transformer by 2.5 AP box and 2.6 AP mask on COCO, and achieves better results compared with the previous best adapted Vanilla ViT detector using a more modest fine-tuning recipe.

MeMOT: Multi-Object Tracking with Memory

We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span. This is realized by
...