• Publications
  • Influence
Learning Deep Features for Discriminative Localization
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization abilityExpand
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
  • A. Oliva, A. Torralba
  • Mathematics, Computer Science
  • International Journal of Computer Vision
  • 1 May 2001
TLDR
The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category. Expand
Learning Deep Features for Scene Recognition using Places Database
TLDR
A new scene-centric database called Places with over 7 million labeled pictures of scenes is introduced with new methods to compare the density and diversity of image datasets and it is shown that Places is as dense as other scene datasets and has more diversity. Expand
Places: A 10 Million Image Database for Scene Recognition
TLDR
The Places Database is described, a repository of 10 million scene photographs, labeled with scene semantic categories, comprising a large and diverse list of the types of environments encountered in the world, using the state-of-the-art Convolutional Neural Networks as baselines, that significantly outperform the previous approaches. Expand
SUN database: Large-scale scene recognition from abbey to zoo
TLDR
This paper proposes the extensive Scene UNderstanding (SUN) database that contains 899 categories and 130,519 images and uses 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance. Expand
Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search.
TLDR
An original approach of attentional guidance by global scene context is presented that combines bottom-up saliency, scene context, and top-down mechanisms at an early stage of visual processing and predicts the image regions likely to be fixated by human observers performing natural search tasks in real-world scenes. Expand
Network Dissection: Quantifying Interpretability of Deep Visual Representations
TLDR
This work uses the proposed Network Dissection method to test the hypothesis that interpretability is an axis-independent property of the representation space, then applies the method to compare the latent representations of various networks when trained to solve different classification problems. Expand
Building the gist of a scene: the role of global image features in recognition.
TLDR
It is shown that the structure of a scene image can be estimated by the mean of global image features, providing a statistical summary of the spatial layout properties (Spatial Envelope representation) of the scene. Expand
Object Detectors Emerge in Deep Scene CNNs
TLDR
This work demonstrates that the same network can perform both scene recognition and object localization in a single forward-pass, without ever having been explicitly taught the notion of objects. Expand
From Blobs to Boundary Edges: Evidence for Time- and Spatial-Scale-Dependent Scene Recognition
In very fast recognition tasks, scenes are identified as fast as isolated objects How can this efficiency be achieved, considering the large number of component objects and interfering factors, suchExpand
...
1
2
3
4
5
...