R-FCN-3000 at 30fps: Decoupling Detection and Classification

@article{Singh2018RFCN3000A3,
  title={R-FCN-3000 at 30fps: Decoupling Detection and Classification},
  author={Bharat Singh and Hengduo Li and Abhishek Sharma and Larry S. Davis},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2018},
  pages={1081-1090}
}
  • Bharat Singh, Hengduo Li, L. Davis
  • Published 5 December 2017
  • Computer Science
  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
We propose a modular approach towards large-scale real-time object detection by decoupling objectness detection and classification. We exploit the fact that many object classes are visually similar and share parts. Thus, a universal objectness detector can be learned for class-agnostic object detection followed by fine-grained classification using a (non)linear classifier. Our approach is a modification of the R-FCN architecture to learn shared filters for performing localization across… 
Scaling Object Detection by Transferring Classification Weights
TLDR
This paper first introduces input and feature normalization schemes to curb the under-fitting during training of a vanilla WTN, and proposes autoencoder-WTN (AE- WTN) which uses reconstruction loss to preserve classification network's information over all classes in the target latent space to ensure generalization to novel classes.
Learning Open-World Object Proposals Without Learning to Classify
TLDR
A classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlap with any ground-truth object (e.g., centerness and IoU).
An Analysis of Pre-Training on Object Detection
TLDR
This work provides a detailed analysis of convolutional neural networks which are pre-trained on the task of object detection, and analyzes how well their features generalize to tasks like image classification, semantic segmentation and object detection on small datasets like PASCAL-VOC, Caltech-256, SUN-397, Flowers-102 etc.
Detecting 11K Classes: Large Scale Object Detection Without Fine-Grained Bounding Boxes
TLDR
This paper proposes a semi-supervised large scale fine-grained detection method, which only needs bounding box annotations of a smaller number of coarse- grained classes and image-level labels of large scalefine-grains classes, and can detect all classes at nearly fully-super supervised accuracy.
Hierarchical Structure and Joint Training for Large Scale Semi-supervised Object Detection
TLDR
A novel hierarchical structure and joint training framework for large scale semi-supervised object detection is proposed, utilizing the relationships among target categories to model a hierarchical network to further improve the performance of recognition.
What leads to generalization of object proposals?
TLDR
This work introduces the idea of prototypical classes: a set of sufficient and necessary classes required to train a detection model to obtain generalized proposals in a more data-efficient way, and demonstrates that Faster R-CNN model leads to better generalization of proposals compared to a single-stage network like RetinaNet.
DeRPN: Taking a further step toward more general object detection
TLDR
A novel dimension-decomposition region proposal network (DeRPN) that can perfectly displace the traditional Region Proposal Network (RPN), which utilizes an anchor string mechanism to independently match object widths and heights, which is conducive to treating variant object shapes.
Soft Sampling for Robust Object Detection
TLDR
The robustness of object detection under the presence of missing annotations is studied, and it is observed that after dropping 30% of the annotations, the performance of CNN-based object detectors like Faster-RCNN only drops by 5% on the PASCAL VOC dataset.
Deep Learning for Generic Object Detection: A Survey
TLDR
A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.
Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network
TLDR
A novel end-to-end Attention Feature Pyramid Transformer Network framework to learn the object detectors with multi-scale feature maps via a transformer encoder-decoder fashion and achieves the state-of-the-art results.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 43 REFERENCES
LSDA: Large Scale Detection through Adaptation
TLDR
This paper proposes Large Scale Detection through Adaptation (LSDA), an algorithm which learns the difference between the two tasks and transfers this knowledge to classifiers for categories without bounding box annotated data, turning them into detectors.
R-FCN: Object Detection via Region-based Fully Convolutional Networks
TLDR
This work presents region-based, fully convolutional networks for accurate and efficient object detection, and proposes position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection.
Revisiting Knowledge Transfer for Training Object Class Detectors
TLDR
A unified knowledge transfer framework based on training a single neural network multi-class object detector over all source classes, organized in a semantic hierarchy is presented, establishing its general applicability.
You Only Look Once: Unified, Real-Time Object Detection
TLDR
Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
YOLO9000: Better, Faster, Stronger
TLDR
YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories, is introduced and a method to jointly train on object detection and classification is proposed, both novel and drawn from prior work.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Focal Loss for Dense Object Detection
TLDR
This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Large Scale Semi-Supervised Object Detection Using Visual and Semantic Knowledge Transfer
TLDR
Strong evidence is found that visual similarity and semantic relatedness are complementary for the task, and when combined notably improve detection, achieving state-of-the-art detection performance in a semi-supervised setting.
PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN
TLDR
A Parallel, Pairwise Region-based, Fully Convolutional Network (PPR-FCN) for WSVRD uses a parallel FCN architecture that simultaneously performs pair selection and classification of single regions and region pairs for object and relation detection, while sharing almost all computation shared over the entire image.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TLDR
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
...
1
2
3
4
5
...