Scalable Object Detection Using Deep Neural Networks

  title={Scalable Object Detection Using Deep Neural Networks},
  author={D. Erhan and Christian Szegedy and Alexander Toshev and Dragomir Anguelov},
  journal={2014 IEEE Conference on Computer Vision and Pattern Recognition},
Deep convolutional neural networks have recently achieved state-of-the-art performance on a number of image recognition benchmarks, including the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC-2012. [] Key Method The model naturally handles a variable number of instances for each class and allows for cross-class generalization at the highest levels of the network. We are able to obtain competitive recognition performance on VOC2007 and ILSVRC2012, while using only the top few predicted locations…

Figures and Tables from this paper

Region-Based Convolutional Networks for Accurate Object Detection and Segmentation

A simple and scalable detection algorithm that improves mean average precision (mAP) by more than 50 percent relative to the previous best result on VOC 2012-achieving a mAP of 62.4 percent.

Hierarchical part detection with deep neural networks

Experiments show that the hierarchical approach outperforms a network which directly regresses the part locations and obtains part detection accuracy comparable or better than state-of-the-art on the CUB-200 bird and Fashionista clothing item datasets with only a fraction of the number of part proposals.

Wide-residual-inception networks for real-time object detection

A wide-residual-inception (WR-Inception) network is proposed, which constructs the architecture based on a residual inception unit that captures objects of various sizes on the same feature map, as well as shallower and wider layers, compared to state-of-the-art networks like ResNets.

Boosting Convolutional Features for Robust Object Proposals

A boosting approach is proposed which directly takes advantage of hierarchical CNN features for detecting regions of interest fast and is demonstrated on ImageNet 2013 detection benchmark and compared with state-of-the-art methods.

Self-taught object localization with deep networks

This paper introduces self-taught object localization, a novel approach that leverages deep convolutional networks trained for whole-image recognition to localize objects in images without additional

Point Linking Network for Object Detection

A novel object bounding box representation using points and links and implemented using deep ConvNets, termed as Point Linking Network (PLN), which is naturally robust to object occlusion and flexible to object scale variation and aspect ratio variation.

Learning to detect and localize many objects from few examples

A new neural model which directly predicts bounding box coordinates and is more powerful than the state of the art in applications where training data is not as abundant as in the classical configuration of natural images and Imagenet/Pascal VOC tasks.

Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction

This work addresses the localization problem by using a search algorithm based on Bayesian optimization that sequentially proposes candidate regions for an object bounding box, and training the CNN with a structured loss that explicitly penalizes the localization inaccuracy.

Learning to decompose for object detection and instance segmentation

This work proposes a novel end-to-end trainable deep neural network architecture that generates the correct number of object instances and their bounding boxes (or segmentation masks) given an image, using only a single network evaluation without any pre- or post-processing steps.

Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks

This article reviews the recent literature on object detection with deep CNN, in a comprehensive way, and provides an in-depth view of these recent advances.



Deep Neural Networks for Object Detection

This paper presents a simple and yet powerful formulation of object detection as a regression problem to object bounding box masks, and defines a multi-scale inference procedure which is able to produce high-resolution object detections at a low cost by a few network applications.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

This integrated framework for using Convolutional Networks for classification, localization and detection is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 and obtained very competitive results for the detection and classifications tasks.

ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

The Pascal Visual Object Classes (VOC) Challenge

The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.

Segmentation as selective search for object recognition

This work adapt segmentation as a selective search by reconsidering segmentation to generate many approximate locations over few and precise object delineations because an object whose location is never generated can not be recognised and appearance and immediate nearby context are most effective for object recognition.

Beyond sliding windows: Object localization by efficient subwindow search

A simple yet powerful branch-and-bound scheme that allows efficient maximization of a large class of classifier functions over all possible subimages and converges to a globally optimal solution typically in sublinear time is proposed.

Latent hierarchical structural learning for object detection

This paper describes an incremental concave-convex procedure (iCCCP) which allows us to learn both two and three layer models efficiently and demonstrates the advantages of three layer hierarchies - outperforming Felzenszwalb et al.'s two layer models on all 20 classes.

What is an object?

A generic objectness measure, quantifying how likely it is for an image window to contain an object of any class, is presented, combining in a Bayesian framework several image cues measuring characteristics of objects, such as appearing different from their surroundings and having a closed boundary.

Object Detection with Discriminatively Trained Part Based Models

We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in