ImageNet: A large-scale hierarchical image database
- Jia Deng, Wei Dong, R. Socher, Li-Jia Li, K. Li, Li Fei-Fei
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 20 June 2009
A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
ImageNet Large Scale Visual Recognition Challenge
- Olga Russakovsky, Jia Deng, Li Fei-Fei
- Computer ScienceInternational Journal of Computer Vision
- 1 September 2014
The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared.
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
- Justin Johnson, Alexandre Alahi, Li Fei-Fei
- Computer ScienceEuropean Conference on Computer Vision
- 27 March 2016
This work considers image transformation problems, and proposes the use of perceptual loss functions for training feed-forward networks for image transformation tasks, and shows results on image style transfer, where aFeed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.
3D Object Representations for Fine-Grained Categorization
- J. Krause, Michael Stark, Jia Deng, Li Fei-Fei
- Computer ScienceIEEE International Conference on Computer Vision…
- 2 December 2013
This paper lifts two state-of-the-art 2D object representations to 3D, on the level of both local feature appearance and location, and shows their efficacy for estimating 3D geometry from images via ultra-wide baseline matching and 3D reconstruction.
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
- Ranjay Krishna, Yuke Zhu, Li Fei-Fei
- Computer ScienceInternational Journal of Computer Vision
- 23 February 2016
The Visual Genome dataset is presented, which contains over 108K images where each image has an average of $$35$$35 objects, $$26$$26 attributes, and $$21$$21 pairwise relationships between objects, and represents the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs.
Deep Visual-Semantic Alignments for Generating Image Descriptions
- A. Karpathy, Li Fei-Fei
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine…
- 6 December 2014
A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented.
Large-Scale Video Classification with Convolutional Neural Networks
- A. Karpathy, G. Toderici, Sanketh Shetty, Thomas Leung, R. Sukthankar, Li Fei-Fei
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 23 June 2014
This work studies multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggests a multiresolution, foveated architecture as a promising way of speeding up the training.
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories
- Li Fei-Fei, R. Fergus, P. Perona
- Computer ScienceConference on Computer Vision and Pattern…
- 27 June 2004
Social LSTM: Human Trajectory Prediction in Crowded Spaces
- Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, S. Savarese
- Computer ScienceComputer Vision and Pattern Recognition
- 27 June 2016
This work proposes an LSTM model which can learn general human movement and predict their future trajectories and outperforms state-of-the-art methods on some of these datasets.
Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks
- Agrim Gupta, Justin Johnson, Li Fei-Fei, S. Savarese, Alexandre Alahi
- Computer ScienceIEEE/CVF Conference on Computer Vision and…
- 29 March 2018
A recurrent sequence-to-sequence model observes motion histories and predicts future behavior, using a novel pooling mechanism to aggregate information across people, and outperforms prior work in terms of accuracy, variety, collision avoidance, and computational complexity.
...
...