Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

  title={Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation},
  author={Ross B. Girshick and Jeff Donahue and Trevor Darrell and Jitendra Malik},
  journal={2014 IEEE Conference on Computer Vision and Pattern Recognition},
Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. [] Key Method Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.

Figures and Tables from this paper

Region-Based Convolutional Networks for Accurate Object Detection and Segmentation

A simple and scalable detection algorithm that improves mean average precision (mAP) by more than 50 percent relative to the previous best result on VOC 2012-achieving a mAP of 62.4 percent.

Improve object detection via a multi-feature and multi-task CNN model

An object detection system based on standard Fast R-CNN object detection branch and DeepLap semantic segmentation branch that multi-feature aggregates hierarchical features for more finer feature maps to detect objects at multiple scales and a novel overlap loss function is used for bounding box regression to improve localization.

Boosting Convolutional Features for Robust Object Proposals

A boosting approach is proposed which directly takes advantage of hierarchical CNN features for detecting regions of interest fast and is demonstrated on ImageNet 2013 detection benchmark and compared with state-of-the-art methods.

CNN based region proposals for efficient object detection

A high-confidence region-based object detection framework that boosts up the classification performance with less computational burden, and significantly reduces the computational complexity and improves the performance in object detection.

Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model

An object detection system that relies on a multi-region deep convolutional neural network that also encodes semantic segmentation-aware features that aims at capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization.

Simultaneous Detection and Segmentation

This work builds on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN), introducing a novel architecture tailored for SDS, and uses category-specific, top-down figure-ground predictions to refine the bottom-up proposals.

Object proposals using CNN-based edge filtering

This paper proposes a novel idea of filtering irrelevant edges using semantic image filtering and true objectness learnt within convolutional layers of CNN, and localizes well proposals by producing highly accurate bounding boxes and reduces the number of proposals.

Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks

Convolutional Oriented Boundaries gives a significant leap in performance over the state-of-the-art, and generalizes very well to unseen categories and datasets, and learning to estimate not only contour strength but also orientation provides more accurate results.

Exploit Bounding Box Annotations for Multi-Label Object Recognition

This paper first extracts object proposals from each image, then proposes to make use of ground-truth bounding box annotations (strong labels) to add another level of local information by using nearest-neighbor relationships of local regions to form a multi-view pipeline.

Semi-supervised exemplar learning for object detection in aerial imagery

A semi-supervised approach that combines the strengths of CNN structures with pre-processing steps borrowed from exemplar learning is employed, which combines the ability to use generically-learned class-relatedness with CNN-based detectors.



Learning Hierarchical Features for Scene Labeling

A method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel, alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information.

Regionlets for Generic Object Detection

This work proposes to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as region lets, which significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method.

Bottom-Up Segmentation for Top-Down Detection

A novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions that outperform the previous state-of-the-art on VOC 2010 test by 4%.

Semantic segmentation using regions and parts

A novel design for region-based object detectors that integrates efficiently top-down information from scanning-windows part models and global appearance cues is proposed that produces class-specific scores for bottom-up regions, and then aggregate the votes of multiple overlapping candidates through pixel classification.

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

This integrated framework for using Convolutional Networks for classification, localization and detection is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 and obtained very competitive results for the detection and classifications tasks.

Boosted local structured HOG-LBP for object localization

This paper proposes a boosted Local Structured HOG-LBP based object detector to capture the object's local structure, and develop the descriptors from shape and texture information, respectively, and presents a boosted feature selection and fusion scheme for part based object detectors.

Measuring the Objectness of Image Windows

A generic objectness measure, quantifying how likely it is for an image window to contain an object of any class, and uses objectness as a complementary score in addition to the class-specific model, which leads to fewer false positives.

Recognition using regions

This paper presents a unified framework for object detection, segmentation, and classification using regions using a generalized Hough voting scheme to generate hypotheses of object locations, scales and support, followed by a verification classifier and a constrained segmenter on each hypothesis.

CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts

A novel framework to generate and rank plausible hypotheses for the spatial extent of objects in images using bottom-up computational processes and mid-level selection cues and it is shown that the algorithm can be used, successfully, in a segmentation-based visual object category recognition pipeline.

Deep Neural Networks for Object Detection

This paper presents a simple and yet powerful formulation of object detection as a regression problem to object bounding box masks, and defines a multi-scale inference procedure which is able to produce high-resolution object detections at a low cost by a few network applications.