Corpus ID: 8217340

Object Detectors Emerge in Deep Scene CNNs

@article{Zhou2015ObjectDE,
  title={Object Detectors Emerge in Deep Scene CNNs},
  author={Bolei Zhou and A. Khosla and {\`A}. Lapedriza and A. Oliva and A. Torralba},
  journal={CoRR},
  year={2015},
  volume={abs/1412.6856}
}
With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly. One important factor for continued progress is to understand the representations that are learned by the inner layers of these deep architectures. Here we show that object detectors emerge from training CNNs to perform scene… Expand
Object Detectors Emerge from Training CNNs for Scene Recognition
With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e.g., ImageNet,Expand
Places: A 10 Million Image Database for Scene Recognition
TLDR
The Places Database is described, a repository of 10 million scene photographs, labeled with scene semantic categories, comprising a large and diverse list of the types of environments encountered in the world, using the state-of-the-art Convolutional Neural Networks as baselines, that significantly outperform the previous approaches. Expand
Understand scene categories by objects: A semantic regularized scene classifier using Convolutional Neural Networks
TLDR
The proposed deep architecture achieves superior scene classification results to the state-of-the-art on a publicly available SUN RGB-D dataset, and performance of semantic segmentation, the regularizer, reaches a new record with refinement derived from predicted scene labels. Expand
Scene Recognition with CNNs: Objects, Scales and Dataset Bias
TLDR
This paper addresses two related problems: 1) scale induced dataset bias in multi-scale convolutional neural network (CNN) architectures, and 2) how to combine effectively scene-centric and object-centric knowledge (i.e. Places and ImageNet) in CNNs. Expand
Few-shot Learning by Exploiting Visual Concepts within CNNs
TLDR
This work develops novel, flexible, and interpretable models based on the idea of encoding objects in terms of visual concepts (VCs), which are interpretable visual cues represented by the feature vectors within CNNs, and concludes that using VCs helps expose the natural capability of CNNs for few-shot learning. Expand
Locally Supervised Deep Hybrid Model for Scene Recognition
TLDR
This paper proposes a novel locally supervised deep hybrid model (LS-DHM) that effectively enhances and explores the convolutional features for scene recognition, and proposes an efficient Fisher Convolutional vector (FCV) that successfully rescues the orderless mid-level semantic information of scene image. Expand
Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks
TLDR
This article reviews the recent literature on object detection with deep CNN, in a comprehensive way, and provides an in-depth view of these recent advances. Expand
Places: An Image Database for Deep Scene Understanding
TLDR
The Places Database is described, a repository of 10 million scene photographs, labeled with scene semantic categories and attributes, comprising a quasi-exhaustive list of the types of environments encountered in the world. Expand
Is object localization for free? - Weakly-supervised learning with convolutional neural networks
TLDR
A weakly supervised convolutional neural network is described for object classification that relies only on image-level labels, yet can learn from cluttered scenes containing multiple objects. Expand
An attention model based on spatial transformers for scene recognition
TLDR
This work proposes an end-to-end pipeline by combing convolutional neural networks with explicit attention model to determine several meaningful regions of original images for scene recognition and finds that the model is able to learn informative attention regions for discriminating scene categories. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
Learning Deep Features for Scene Recognition using Places Database
TLDR
A new scene-centric database called Places with over 7 million labeled pictures of scenes is introduced with new methods to compare the density and diversity of image datasets and it is shown that Places is as dense as other scene datasets and has more diversity. Expand
Analyzing the Performance of Multilayer Neural Networks for Object Recognition
TLDR
This paper experimentally probes several aspects of CNN feature learning in an attempt to help practitioners gain useful, evidence-backed intuitions about how to apply CNNs to computer vision problems. Expand
Do Convnets Learn Correspondence?
TLDR
Evidence is presented that convnet features localize at a much finer scale than their receptive field sizes, that they can be used to perform intraclass aligment as well as conventional hand-engineered features, and that they outperform conventional features in keypoint prediction on objects from PASCAL VOC 2011. Expand
Weakly supervised object recognition with convolutional neural networks
TLDR
A weakly supervised convolutional neural network for object recognition that does not rely on detailed object annotation and yet returns 86.3% mAP on the Pascal VOC classification task, outperforming previous fully-supervised systems by a sizable margin. Expand
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Expand
Learning Hierarchical Features for Scene Labeling
TLDR
A method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel, alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information. Expand
Self-taught object localization with deep networks
This paper introduces self-taught object localization, a novel approach that leverages deep convolutional networks trained for whole-image recognition to localize objects in images without additionalExpand
ImageNet Large Scale Visual Recognition Challenge
TLDR
The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared. Expand
Deep Convolutional Networks for Scene Parsing
We propose a deep learning strategy for scene parsing, i.e. to asssign a class label to each pixel of an image. We investigate the use of deep convolutional network for modeling the complex sceneExpand
Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification
TLDR
A high-level image representation, called the Object Bank, is proposed, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Expand
...
1
2
3
4
...