StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection

@article{Woo2018StairNetTS,
  title={StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection},
  author={Sanghyun Woo and Soonmin Hwang and In-So Kweon},
  journal={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year={2018},
  pages={1093-1102}
}
One-stage object detectors such as SSD or YOLO already have shown promising accuracy with small memory footprint and fast speed. However, it is widely recognized that one-stage detectors have difficulty in detecting small objects while they are competitive with two-stage methods on large objects. In this paper, we investigate how to alleviate this problem starting from the SSD framework. Due to their pyramidal design, the lower layer that is responsible for small objects lacks strong semantics… 

Figures and Tables from this paper

MPSSD: Multi-Path Fusion Single Shot Detector

TLDR
This work proposes a novel multi-path design to fully utilize the localization and semantics information and exploits the original SSD multi-scale features as the authors' base pyramid, which outperforms many state-of-the-art detectors.

Dense Receptive Field for Object Detection

TLDR
This paper proposes a novel single-shot based detector, called DRFNet which fuses feature maps with different sizes of the receptive field to boost the detection accuracy, and demonstrates that DRF net is better than other state-of-the-art one-stage detectors similar to FPN.

Dense Receptive Field for Object Detection

TLDR
This paper proposes a novel single-shot based detector, called DRFNet which fuses feature maps with different sizes of the receptive field to boost the detection accuracy, and demonstrates that DRF net is better than other state-of-the-art one-stage detectors similar to FPN.

Two-layer Residual Feature Fusion for Object Detection

TLDR
This paper proposes a method to enrich the representation power of feature maps using a new feature fusion method which makes use of the information from the consecutive layer and adopts a unified prediction module which has an enhanced generalization performance.

Feature reusing and semantic aggregation for single stage object detector

TLDR
A method for boosting the performance of the classical SSD object detector that has a good tradeoff between accuracy and speed, and can detect more targets in small size and has an accurate localization.

Weaving Multi-scale Context for Single Shot Detector

TLDR
A novel network topology, called WeaveNet, is proposed that can efficiently fuse multi-scale information and boost the detection accuracy with negligible extra cost and is easy to train without requiring batch normalization and can be further accelerated by the proposed architecture simplification.

Gated bidirectional feature pyramid network for accurate one-shot detection

TLDR
The gated bidirectional feature pyramid network (GBFPN), a simple and effective architecture that provides a significant improvement over the baseline model, StairNet, shows state-of-the-art results.

Image Multi-scale feature maps Convolutional backbone Cls Loc Regress

TLDR
This paper proposes a method to enrich the representation power of feature maps using Resblock and deconvolution layers and enables more precise prediction, which achieves higher score than SSD on PASCAL VOC and MS COCO.

Residual Features and Unified Prediction Network for Single Stage Detection

TLDR
This paper proposes a method to enrich the representation power of feature maps using Resblock and deconvolution layers and enables more precise prediction, which achieves higher score than SSD on PASCAL VOC and MS COCO.

Propose-and-Attend Single Shot Detector

TLDR
A simple yet effective prediction module for a one-stage detector that allows train-from-scratch without relying on any sophisticated base networks as previous methods do and achieves an accuracy comparable to that of state-of-the-art detectors while using a fraction of their model parameter and computational overheads.
...

References

SHOWING 1-10 OF 44 REFERENCES

Context-Aware Single-Shot Detector

TLDR
This paper presents CSSD–a shorthand for context-aware single-shot multibox object detector, built on top of SSD, with additional layers modeling multi-scale contexts, and describes two variants of CSSD, which differ in their context layers, using dilated convolution layers and deconvolution layers.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

You Only Look Once: Unified, Real-Time Object Detection

TLDR
Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

Feature Pyramid Networks for Object Detection

TLDR
This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.

Accurate Single Stage Detector Using Recurrent Rolling Convolution

TLDR
A novel single stage end-to-end trainable object detection network is proposed by introducing Recurrent Rolling Convolution (RRC) architecture over multi-scale feature maps to construct object classifiers and bounding box regressors which are deep in context.

SSD: Single Shot MultiBox Detector

TLDR
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.

DSSD : Deconvolutional Single Shot Detector

TLDR
This paper combines a state-of-the-art classifier with a fast detection framework and augments SSD+Residual-101 with deconvolution layers to introduce additional large-scale context in object detection and improve accuracy, especially for small objects.

R-FCN: Object Detection via Region-based Fully Convolutional Networks

TLDR
This work presents region-based, fully convolutional networks for accurate and efficient object detection, and proposes position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection.

Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks

TLDR
The Inside-Outside Net (ION), an object detector that exploits information both inside and outside the region of interest, provides strong evidence that context and multi-scale representations improve small object detection.

ParseNet: Looking Wider to See Better

TLDR
This work presents a technique for adding global context to deep convolutional networks for semantic segmentation, and achieves state-of-the-art performance on SiftFlow and PASCAL-Context with small additional computational cost over baselines.