Disentangle Your Dense Object Detector

@article{Chen2021DisentangleYD,
  title={Disentangle Your Dense Object Detector},
  author={Zehui Chen and Chenhongyi Yang and Qiaofei Li and Feng Zhao and Zhengjun Zha and Feng Wu},
  journal={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}
Deep learning-based dense object detectors have achieved great success in the past few years and have been applied to numerous multimedia applications such as video understanding. However, the current training pipeline for dense detectors is compromised to lots of conjunctions that may not hold. In this paper, we investigate three such important conjunctions: 1) only samples assigned as positive in classification head are used to train the regression head; 2) classification and regression share… 

Figures and Tables from this paper

Prediction-Guided Distillation for Dense Object Detection
TLDR
This work shows that only a very small fraction of features within a groundtruth bounding box are responsible for a teacher’s high detection performance, and proposes Prediction-Guided Distillation (PGD), which focuses distillation on these key predictive regions of the teacher and yields considerable gains in performance over many existing KD baselines.
TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform Mask
Arbitrary-shaped scene text detection is a challenging task due to the variety of text changes in font, size, color, and orientation. Most existing regression based methods resort to regress the

References

SHOWING 1-10 OF 68 REFERENCES
Focal Loss for Dense Object Detection
TLDR
This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Focal Loss for Dense Object Detection
TLDR
This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Feature Pyramid Networks for Object Detection
TLDR
This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.
PyramidBox++: High Performance Detector for Finding Tiny Face
TLDR
Improve each part of PyramidBox++ to further boost the performance, including Balanced-data-anchor-sampling, Dual-PyramidAnchors and Dense Context Module, which achieves state-of-the-art performance in hard set.
Scale-Aware Trident Networks for Object Detection
TLDR
A novel Trident Network (TridentNet) aiming to generate scale-specific feature maps with a uniform representational power is proposed and a parallel multi-branch architecture in which each branch shares the same transformation parameters but with different receptive fields is constructed.
End-to-End Object Detection with Fully Convolutional Network
TLDR
A Prediction-aware One-To-One (POTO) label assignment for classification is introduced to enable end-to-end detection, which obtains comparable performance with NMS and a simple 3D Max Filtering is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region.
Cascade R-CNN: Delving Into High Quality Object Detection
TLDR
A simple implementation of the Cascade R-CNN is shown to surpass all single-model object detectors on the challenging COCO dataset, and experiments show that it is widely applicable across detector architectures, achieving consistent gains independently of the baseline detector strength.
YOLOv4: Optimal Speed and Accuracy of Object Detection
TLDR
This work uses new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, C mBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Accurate Face Detection for High Performance
TLDR
This report applies the Intersection over Union (IoU) loss function for regression, employ the two-step classification and regression for detection, revisit the data augmentation based on data-anchor-sampling for training, utilize the max-out operation for classification and use the multi-scale testing strategy for inference.
...
...