• Corpus ID: 231648084

1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking

  title={1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking},
  author={Fei Du and Boao Xu and Jiasheng Tang and Yuqi Zhang and F. Wang and Hao Li},
We extend the classical tracking-by-detection paradigm to this tracking-any-object task. Solid detection results are first extracted from TAO dataset. Some state-of-the-art techniques like BAlanced-Group Softmax (BAGS[7]) and DetectoRS[11] are integrated during detection. Then we learned appearance features to represent any object by training feature learning networks. We ensemble several models for improving detection and feature representation. Simple linking strategies with most similar… 

Figures and Tables from this paper

Global Tracking Transformers
Experiments on the challenging TAO dataset show that the framework consistently improves upon baselines that are based on pairwise association, outperforming published works by a significant 7 .
City-Scale Multi-Camera Vehicle Tracking Guided by Crossroad Zones
  • Chong Liu, Yuqi Zhang, Yiyan Shen
  • Computer Science
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2021
The solution to the Track 3 multi-camera vehicle tracking task in 2021 AI City Challenge (AICITY21) is described and the Tracklet Filter Strategy and the Direction Based Temporal Mask are proposed.
Opening up Open-World Tracking
A new benchmark, TAO-OW: Tracking Any Object in an Open World is proposed, existing efforts in multi-object tracking are analyzed, and a baseline for this task is constructed while highlighting future challenges.
Reliable Multi-Object Tracking in the Presence of Unreliable Detections
It is found that RCT outperforms other algorithms when provided with imperfect detections, including state-of-the-art deep single and multi-object trackers as well as more classic approaches, and has the best average HOTA across methods that successfully return results for all sequences.


LVIS: A Dataset for Large Vocabulary Instance Segmentation
This work introduces LVIS (pronounced ‘el-vis’): a new dataset for Large Vocabulary Instance Segmentation, which has a long tail of categories with few training samples due to the Zipfian distribution of categories in natural images.
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution
This paper proposes Recursive Feature Pyramid, which incorporates extra feedback connections from Feature Pyramid Networks into the bottom-up backbone layers and proposes Switchable Atrous Convolution, which convolves the features with different atrous rates and gathers the results using switch functions.
GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild
A large tracking database that offers an unprecedentedly wide coverage of common moving objects in the wild, called GOT-10k, and the first video trajectory dataset that uses the semantic hierarchy of WordNet to guide class population, which ensures a comprehensive and relatively unbiased coverage of diverse moving objects.
A Strong Baseline and Batch Normalization Neck for Deep Person Re-Identification
Extended experiments show that BNNeck can boost the baseline, and the baseline can improve the performance of existing state-of-the-art methods.
Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax
  • Yu Li, Tao Wang, Jiashi Feng
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
This work provides the first systematic analysis on the underperformance of state-of-the-art models in front of long-tail distribution and proposes a novel balanced group softmax (BAGS) module for balancing the classifiers within the detection frameworks through group-wise training.
TAO: A Large-Scale Benchmark for Tracking Any Object
It is shown that existing single- and multi-object trackers struggle when applied to this scenario in the wild, and that detection-based, multi- object trackers are in fact competitive with user-initialized ones.
Bag of Tricks and a Strong Baseline for Deep Person Re-Identification
A simple and efficient baseline for person re-identification with deep neural networks by combining effective training tricks together, which achieves 94.5% rank-1 and 85.9% mAP on Market1501 with only using global features.
MMDetection: 1 Reported on LVIS v1.0 validation
  • arXiv preprint arXiv:1906.07155
  • 2019
SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks
This work proves the core reason Siamese trackers still have accuracy gap comes from the lack of strict translation invariance, and proposes a new model architecture to perform depth-wise and layer-wise aggregations, which not only improves the accuracy but also reduces the model size.
Tracking Without Bells and Whistles
Overall, Tracktor yields superior tracking performance than any current tracking method and the analysis exposes remaining and unsolved tracking challenges to inspire future research directions.