• Publications
  • Influence
ImageNet Large Scale Visual Recognition Challenge
The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has beenExpand
  • 16,494
  • 2658
  • PDF
Learning Deep Features for Discriminative Localization
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization abilityExpand
  • 2,553
  • 581
  • PDF
Places: A 10 Million Image Database for Scene Recognition
The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification performance at tasks such as visual object and sceneExpand
  • 1,013
  • 216
  • PDF
Multimodal Deep Learning
Deep networks have been successfully applied to unsupervised feature learning for single modalities (e.g., text, images or audio). In this work, we propose a novel application of deep networks toExpand
  • 1,943
  • 181
  • PDF
Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs
We introduce a 120 class Stanford Dogs dataset, a challenging and large-scale dataset aimed at fine-grained image categorization. Stanford Dogs includes over 22,000 annotated images of dogs belongingExpand
  • 469
  • 124
  • PDF
Eye Tracking for Everyone
From scientific research to commercial applications, eye tracking is an important tool across many domains. Despite its range of applications, eye tracking has yet to become a pervasive technology.Expand
  • 337
  • 79
  • PDF
Object Detectors Emerge in Deep Scene CNNs
With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e.g., ImageNet,Expand
  • 794
  • 70
  • PDF
Network Dissection: Quantifying Interpretability of Deep Visual Representations
We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a setExpand
  • 523
  • 66
  • PDF
Understanding and Predicting Image Memorability at a Large Scale
Progress in estimating visual memorability has been limited by the small scale and lack of variety of benchmark data. Here, we introduce a novel experimental procedure to objectively measure humanExpand
  • 164
  • 50
  • PDF
Places: An Image Database for Deep Scene Understanding
The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition. HereExpand
  • 252
  • 46
  • PDF