BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

@article{Cai2022BigDetectionAL,
  title={BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training},
  author={Likun Cai and Zhi-Li Zhang and Yi Zhu and Li Zhang and Mu Li and X. Xue},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year={2022},
  pages={4776-4786}
}
  • Likun Cai, Zhi-Li Zhang, X. Xue
  • Published 24 March 2022
  • Computer Science
  • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Multiple datasets and open challenges for object detection have been introduced in recent years. To build more general and powerful object detection systems, in this paper, we construct a new large-scale benchmark termed BigDetection. Our goal is to simply leverage the training data from existing datasets (LVIS, OpenImages and Object365) with carefully designed principles, and curate a larger dataset for improved detector pre-training. Specifically, we generate a new taxonomy which unifies the… 

Parallel Pre-trained Transformers (PPT) for Synthetic Data-based Instance Segmentation

TLDR
A Parallel Pre-trained Transformers (PPT) framework is proposed to accomplish the synthetic data-based Instance Segmentation task, and the results are fused by pixel-level Non-maximum Suppression (NMS) algorithm to obtain more robust results.

ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation

TLDR
Ex-perimental results show that the basic ViTPose model outperforms representative methods on the challenging MS COCO Keypoint Detection benchmark, while the largest model sets a new state-of-the-art.

Revealing the Dark Secrets of Masked Image Modeling

TLDR
This paper compares MIM with the long-dominant supervised pre-trained models from two perspectives, the visualizations and the experiments, to uncover their key representational differences and finds that MIM models can perform significantly better on geometric and motion tasks with weak semantics or fine-grained classi-cation tasks, than their supervised counterparts.

V I TP OSE : S IMPLE V ISION T RANSFORMER B ASE LINES FOR H UMAN P OSE E STIMATION

TLDR
It is shown that ViTpose can be easily pretrained using the unlabeled pose data without the need for the large-scale upstream ImageNet data Deng et al. (2009) for better performance.

Soft-labeling Strategies for Rapid Sub-Typing

TLDR
This research provides a new method for automated data collection, curation, labeling, and iterative training with minimal human intervention for the case of overhead satellite imagery and object detection and takes advantage of a real-world instance where a cropped image of a car can automatically receive sub-type information as white or colorful from pixel values alone.

References

SHOWING 1-10 OF 74 REFERENCES

Objects365: A Large-Scale, High-Quality Dataset for Object Detection

  • Shuai ShaoZeming Li Jian Sun
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
TLDR
Object365 can serve as a better feature learning dataset for localization-sensitive tasks like object detection and semantic segmentation and better generalization ability of Object365 has been verified on CityPersons, VOC segmentation, and ADE tasks.

USB: Universal-Scale Object Detection Benchmark

TLDR
The UniversalScale object detection Benchmark (USB) is introduced, and a fast and accurate object detectors called UniverseNets is designed, which surpassed all baselines on USB and achieved state-of-the-art results on existing benchmarks.

Object Detection with a Unified Label Space from Multiple Datasets

TLDR
This work designs a framework which works with such partial annotations, and proposes loss functions that carefully integrate partial but correct annotations with complementary but noisy pseudo labels.

LVIS: A Dataset for Large Vocabulary Instance Segmentation

TLDR
This work introduces LVIS (pronounced ‘el-vis’): a new dataset for Large Vocabulary Instance Segmentation, which has a long tail of categories with few training samples due to the Zipfian distribution of categories in natural images.

You Only Look Once: Unified, Real-Time Object Detection

TLDR
Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

Simple Multi-dataset Detection

TLDR
A simple method for training a unified detector on multiple large-scale datasets that performs as well as dataset-specific models on each training domain, and can generalize to new unseen dataset without fine- tuning on them.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

Regionlets for Generic Object Detection

TLDR
This work proposes to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as region lets, which significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method.

We Don’t Need No Bounding-Boxes: Training Object Class Detectors Using Only Human Verification

TLDR
A new scheme for training object detectors which only requires annotators to verify bounding-boxes produced automatically by the learning algorithm, which iterates between re-training the detector, re-localizing objects in the training images, and human verification.

Towards Universal Object Detection by Domain Attention

TLDR
An effective and efficient universal object detection system that is capable of working on various image domains, from human faces and traffic signs to medical CT images, is developed by the introduction of a new family of adaptation layers, based on the principles of squeeze and excitation.
...