• Publications
  • Influence
Microsoft COCO Captions: Data Collection and Evaluation Server
TLDR
The Microsoft COCO Caption dataset and evaluation server are described and several popular metrics, including BLEU, METEOR, ROUGE and CIDEr are used to score candidate captions. Expand
Improved Baselines with Momentum Contrastive Learning
TLDR
With simple modifications to MoCo, this note establishes stronger baselines that outperform SimCLR and do not require large training batches, and hopes this will make state-of-the-art unsupervised learning research more accessible. Expand
Large Scale Spectral Clustering with Landmark-Based Representation
TLDR
This paper proposes a novel approach, called Landmark-based Spectral Clustering (LSC), for large scale clustering problems, where p representative data points are selected as the landmarks and the spectral embedding of the data can be efficiently computed with the landmark-based representation. Expand
Never-Ending Learning
TLDR
The Never-Ending Language Learner is described, which achieves some of the desired properties of a never-ending learner, and lessons learned are discussed. Expand
Visualizing and Understanding Neural Models in NLP
TLDR
Four strategies for visualizing compositionality in neural models for NLP, inspired by similar work in computer vision, including LSTM-style gates that measure information flow and gradient back-propagation, are described. Expand
Exploring Simple Siamese Representation Learning
TLDR
Surprising empirical results are reported that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Expand
Webly Supervised Learning of Convolutional Networks
TLDR
This work uses easy images to train an initial visual representation and uses this initial CNN to adapt it to harder, more realistic images by leveraging the structure of data and categories, and demonstrates the strength of webly supervised learning by localizing objects in web images and training a R-CNN style detector. Expand
NEIL: Extracting Visual Knowledge from Web Data
TLDR
NEIL (Never Ending Image Learner), a computer program that runs 24 hours per day and 7 days per week to automatically extract visual knowledge from Internet data, is proposed in an attempt to develop the world's largest visual structured knowledge base with minimum human labeling effort. Expand
Towards VQA Models That Can Read
TLDR
A novel model architecture is introduced that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the images. Expand
Mind's eye: A recurrent visual representation for image caption generation
TLDR
This paper explores the bi-directional mapping between images and their sentence-based descriptions with a recurrent neural network that attempts to dynamically build a visual representation of the scene as a caption is being generated or read. Expand
...
1
2
3
4
5
...