Places: An Image Database for Deep Scene Understanding

  title={Places: An Image Database for Deep Scene Understanding},
  author={Bolei Zhou and Aditya Khosla and {\`A}gata Lapedriza and Antonio Torralba and Aude Oliva},
The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition. [] Key Result With its high-coverage and high-diversity of exemplars, the Places Database offers an ecosystem to guide future progress on currently intractable visual recognition problems.

Scene Recognition with Sequential Object Context

A deep network architecture which models the sequential object context of scenes to capture object level information and incorporates object-object relationship and object-scene relationship in an end-to-end trainable manner is proposed.

Multi-Scale Multi-Feature Context Modeling for Scene Recognition in the Semantic Manifold

This paper proposes discriminative patch representations using neural networks and further proposes a hybrid architecture in which the semantic manifold is built on top of multiscale CNNs, which can be computed significantly faster than the Gaussian mixture models of the original SM.

Pooling Objects for Recognizing Scenes without Examples

The first to investigate pooling over ten thousand object classifiers to recognize scenes without examples and to steer the knowledge transfer between objects and scenes learns a semantic embedding with the aid of a large social multimedia corpus.

Short Literature Review for Visual Scene Understanding

  • S. Achirei
  • Computer Science
    Bulletin of the Polytechnic Institute of Iași. Electrical Engineering, Power Engineering, Electronics Section
  • 2021
First part of this paper focuses on deep learning solutions for scene recognition following the main leads: low-level features and object detection, and presents extensively the most relevant datasets for visual scene understanding.

Deriving high-level scene descriptions from deep scene CNN features

  • Akram BayatM. Pomplun
  • Computer Science
    2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)
  • 2017
Two computational models are generated in order to estimate two dominant global properties (naturalness and openness) of an input image which can be predicted from activations in the lowest layer of the convolutional neural network which has been trained for a scene recognition task.

Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs

A multi-resolution CNN architecture that captures visual content and structure at multiple levels is proposed and two knowledge guided disambiguation techniques to deal with the problem of label ambiguity are designed.

Scene Recognition by Joint Learning of DNN from Bag of Visual Words and Convolutional DCT Features

This paper presents a scene classification method in which local and global features are used and are concatenated with the DCT-Convolutional features of AlexNet, and it clearly outperforms in terms of accuracy.

Exploring confusing scene classes for the places dataset: Insights and solutions

This work proposes to use the filter weights at the last stage of a CNN model trained by the Places dataset to explain the source of confusions, and shows that, for a given baseline CNN, the ASC/RF scheme can offer a significant performance gain.

Exploiting Class Hierarchies for Large-Scale Scene Classification Using Hybrid Discriminative Approach

Model based on the idea of fine to coarse category mappings is proposed, whose information is combined with the fusion of feature descriptors resulting in a single feature representation that enhances performance by exploiting hierarchical relationship among the scene categories.

Scene Image Representation by Foreground, Background and Hybrid Features




Learning Deep Features for Scene Recognition using Places Database

A new scene-centric database called Places with over 7 million labeled pictures of scenes is introduced with new methods to compare the density and diversity of image datasets and it is shown that Places is as dense as other scene datasets and has more diversity.

Object Detectors Emerge in Deep Scene CNNs

This work demonstrates that the same network can perform both scene recognition and object localization in a single forward-pass, without ever having been explicitly taught the notion of objects.

Semantic Understanding of Scenes Through the ADE20K Dataset

This work presents a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts, and shows that the networks trained on this dataset are able to segment a wide variety of scenes and objects.

SUN attribute database: Discovering, annotating, and recognizing scene attributes

This paper performs crowd-sourced human studies to find a taxonomy of 102 discriminative attributes and builds the “SUN attribute database” on top of the diverse SUN categorical database, which has potential for use in high-level scene understanding and fine-grained scene recognition.

SUN database: Large-scale scene recognition from abbey to zoo

This paper proposes the extensive Scene UNderstanding (SUN) database that contains 899 categories and 130,519 images and uses 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance.

ImageNet Large Scale Visual Recognition Challenge

The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared.

80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition

For certain classes that are particularly prevalent in the dataset, such as people, this work is able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.

The Cityscapes Dataset for Semantic Urban Scene Understanding

This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity.

Microsoft COCO: Common Objects in Context

We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.