• Publications
  • Influence
ImageNet: A large-scale hierarchical image database
A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
The Visual Genome dataset is presented, which contains over 108K images where each image has an average of $$35$$35 objects, $$26$$26 attributes, and $$21$$21 pairwise relationships between objects, and represents the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs.
Progressive Neural Architecture Search
We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary
MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
Experimental results demonstrate that the proposed novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet, can significantly improve the generalization performance of deep networks trained on corrupted training data.
AMC: AutoML for Model Compression and Acceleration on Mobile Devices
This paper proposes AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality and achieves state-of-the-art model compression results in a fully automated way without any human efforts.
YFCC100M: the new data in multimedia research
This publicly available curated dataset of almost 100 million photos and videos is free and legal for all.
Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification
A high-level image representation, called the Object Bank, is proposed, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task.
What, where and who? Classifying events by scene and object recognition
  • Li-Jia Li, Li Fei-Fei
  • Computer Science
    IEEE 11th International Conference on Computer…
  • 26 December 2007
This paper uses a number of sport games such as snow boarding, rock climbing or badminton to demonstrate event classification and proposes a first attempt to classify events in static images by integrating scene and object categorizations.
Image retrieval using scene graphs
A conditional random field model that reasons about possible groundings of scene graphs to test images and shows that the full model can be used to improve object localization compared to baseline methods and outperforms retrieval methods that use only objects or low-level image features.
Composing Text and Image for Image Retrieval - an Empirical Odyssey
This paper proposes a new way to combine image and text through residual connection, that outperforms existing approaches on 3 different datasets, namely Fashion-200k, MIT-States and a new synthetic dataset the authors create based on CLEVR.