• Publications
  • Influence
ImageNet: A large-scale hierarchical image database
TLDR
A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
ImageNet Large Scale Visual Recognition Challenge
TLDR
The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared.
ImageNet: A large-scale hierarchical image database
TLDR
A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
TLDR
This work considers image transformation problems, and proposes the use of perceptual loss functions for training feed-forward networks for image transformation tasks, and shows results on image style transfer, where aFeed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.
3D Object Representations for Fine-Grained Categorization
TLDR
This paper lifts two state-of-the-art 2D object representations to 3D, on the level of both local feature appearance and location, and shows their efficacy for estimating 3D geometry from images via ultra-wide baseline matching and 3D reconstruction.
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
TLDR
The Visual Genome dataset is presented, which contains over 108K images where each image has an average of $$35$$35 objects, $$26$$26 attributes, and $$21$$21 pairwise relationships between objects, and represents the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs.
Large-Scale Video Classification with Convolutional Neural Networks
TLDR
This work studies multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggests a multiresolution, foveated architecture as a promising way of speeding up the training.
Social LSTM: Human Trajectory Prediction in Crowded Spaces
TLDR
This work proposes an LSTM model which can learn general human movement and predict their future trajectories and outperforms state-of-the-art methods on some of these datasets.
A Bayesian hierarchical model for learning natural scene categories
  • Li Fei-Fei, P. Perona
  • Computer Science
    IEEE Computer Society Conference on Computer…
  • 20 June 2005
TLDR
This work proposes a novel approach to learn and recognize natural scene categories by representing the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning.
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
TLDR
This work presents a diagnostic dataset that tests a range of visual reasoning abilities and uses this dataset to analyze a variety of modern visual reasoning systems, providing novel insights into their abilities and limitations.
...
1
2
3
4
5
...