Corpus ID: 235694413

Simple Training Strategies and Model Scaling for Object Detection

@article{Du2021SimpleTS,
  title={Simple Training Strategies and Model Scaling for Object Detection},
  author={Xianzhi Du and Barret Zoph and Wei-Chih Hung and Tsung-Yi Lin},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.00057}
}
The speed-accuracy Pareto curve of object detection systems have advanced through a combination of better model architectures, training and inference methods. In this paper, we methodically evaluate a variety of these techniques to understand where most of the improvements in modern detection systems come from. We benchmark these improvements on the vanilla ResNet-FPN backbone with RetinaNet and RCNN detectors. The vanilla detectors are improved by 7.7% in accuracy while being 30% faster in… Expand

Figures and Tables from this paper

Revisiting 3D ResNets for Video Recognition
TLDR
A simple scaling strategy for 3D ResNets is proposed, in combination with improved training strategies and minor architectural changes, which attain competitive performance on a large Web Video Text dataset. Expand
Masked-attention Mask Transformer for Universal Image Segmentation
TLDR
Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic), sets a new state-of-the-art for panoptic segmentation, instance segmentation and semantic segmentation. Expand

References

SHOWING 1-10 OF 42 REFERENCES
Focal Loss for Dense Object Detection
TLDR
This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. Expand
Focal Loss for Dense Object Detection
TLDR
This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. Expand
Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors
TLDR
A unified implementation of the Faster R-CNN, R-FCN and SSD systems is presented and the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures is traced out. Expand
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Expand
NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection
TLDR
The adopted Neural Architecture Search is adopted and a new feature pyramid architecture in a novel scalable search space covering all cross-scale connections is discovered, named NAS-FPN, which achieves better accuracy and latency tradeoff compared to state-of-the-art object detection models. Expand
Feature Pyramid Networks for Object Detection
TLDR
This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles. Expand
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TLDR
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features. Expand
Cascade R-CNN: Delving Into High Quality Object Detection
TLDR
A simple implementation of the Cascade R-CNN is shown to surpass all single-model object detectors on the challenging COCO dataset, and experiments show that it is widely applicable across detector architectures, achieving consistent gains independently of the baseline detector strength. Expand
You Only Look Once: Unified, Real-Time Object Detection
TLDR
Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork. Expand
Revisiting ResNets: Improved Training and Scaling Strategies
TLDR
It is found that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. Expand
...
1
2
3
4
5
...