DSOD: Learning Deeply Supervised Object Detectors from Scratch

@article{Shen2017DSODLD,
  title={DSOD: Learning Deeply Supervised Object Detectors from Scratch},
  author={Zhiqiang Shen and Zhuang Liu and Jianguo Li and Yu-Gang Jiang and Yurong Chen and X. Xue},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={1937-1945}
}
We present Deeply Supervised Object Detector (DSOD), a framework that can learn object detectors from scratch. [] Key Method Combining with several other principles, we develop DSOD following the single-shot detection (SSD) framework. Experiments on PASCAL VOC 2007, 2012 and MS COCO datasets demonstrate that DSOD can achieve better results than the state-of-the-art solutions with much more compact models. For instance, DSOD outperforms SSD on all three benchmarks with real-time detection speed, while…

Figures and Tables from this paper

Object Detection from Scratch with Deep Supervision
TLDR
Deeply Supervised Object Detectors (DSOD), an object detection framework that can be trained from scratch, is proposed based on the single-shot detection framework (SSD), and achieves consistently better results than the state-of-the-art methods with much more compact models.
ScratchDet: Exploring to Train Single-Shot Object Detectors from Scratch
TLDR
The ScratchDet achieves the state-of-the-art accuracy on PASCAL VOC 2007, 2012 and MS COCO among all the train-from-scratch detectors and even performs better than several one-stage pretrained methods.
ScratchDet: Training Single-Shot Object Detectors From Scratch
  • Rui Zhu, Shifeng Zhang, Tao Mei
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
TLDR
The ScratchDet achieves the state-of-the-art accuracy on PASCAL VOC 2007, 2012 and MS COCO among all the train-from-scratch detectors and even performs better than several one-stage pretrained methods.
DetNet: Design Backbone for Object Detection
TLDR
DetNet is proposed, which is a novel backbone network specifically designed for object detection that includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers.
DetNet: A Backbone network for Object Detection
TLDR
State-of-the-art results have been obtained for both object detection and instance segmentation on the MSCOCO benchmark based on the DetNet~(4.8G FLOPs) backbone.
Improving Object Detection from Scratch via Gated Feature Reuse
TLDR
A novel gate-controlled prediction strategy enabled by Squeeze-and-Excitation to adaptively enhance or attenuate supervision at different scales based on the input object size is introduced, which is more effective in detecting diverse sizes of objects.
Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids
TLDR
This study proposes a recurrent feature-pyramid structure to squeeze rich spatial and semantic features into a single prediction layer that further reduces the number of parameters to learn, and is the best performed model of learning object detection from scratch.
Object Detection With Deep Learning: A Review
TLDR
This paper provides a review of deep learning-based object detection frameworks and focuses on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further.
Enabling Deep Residual Networks for Weakly Supervised Object Detection
TLDR
The intrinsic root is discovered with sophisticated analysis and a sequence of design principles to take full advantages of deep residual learning for WSOD from the perspectives of adding redundancy, improving robustness and aligning features are proposed.
DAP: Detection-Aware Pre-training with Weak Supervision
TLDR
DAP can outperform the traditional classification pre-training in terms of both sample efficiency and convergence speed in downstream detection tasks including VOC and COCO and boosts the detection accuracy by a large margin when the number of examples in the downstream task is small.
...
...

References

SHOWING 1-10 OF 43 REFERENCES
PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection
This paper presents how we can achieve the state-of-the-art accuracy in multi-category object detection task while minimizing the computational cost by adapting and combining recent technical
R-FCN: Object Detection via Region-based Fully Convolutional Networks
TLDR
This work presents region-based, fully convolutional networks for accurate and efficient object detection, and proposes position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection.
Deeply-Supervised Nets
TLDR
The proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent, and extends techniques from stochastic gradient methods to analyze the algorithm.
You Only Look Once: Unified, Real-Time Object Detection
TLDR
Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Holistically-Nested Edge Detection
  • Saining Xie, Z. Tu
  • Computer Science
    2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
TLDR
HED performs image-to-image prediction by means of a deep learning model that leverages fully convolutional neural networks and deeply-supervised nets, and automatically learns rich hierarchical representations that are important in order to resolve the challenging ambiguity in edge and object boundary detection.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TLDR
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
SSD: Single Shot MultiBox Detector
TLDR
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors
TLDR
A unified implementation of the Faster R-CNN, R-FCN and SSD systems is presented and the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures is traced out.
...
...