Share This Author
Improved Baselines with Momentum Contrastive Learning
With simple modifications to MoCo, this note establishes stronger baselines that outperform SimCLR and do not require large training batches, and hopes this will make state-of-the-art unsupervised learning research more accessible.
Microsoft COCO Captions: Data Collection and Evaluation Server
The Microsoft COCO Caption dataset and evaluation server are described and several popular metrics, including BLEU, METEOR, ROUGE and CIDEr are used to score candidate captions.
Exploring Simple Siamese Representation Learning
Surprising empirical results are reported that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders.
An Empirical Study of Training Self-Supervised Vision Transformers
- Xinlei Chen, Saining Xie, Kaiming He
- Computer ScienceIEEE/CVF International Conference on Computer…
- 5 April 2021
This work investigates the effects of several fundamental components for training self-supervised ViT, and reveals that these results are indeed partial failure, and they can be improved when training is made more stable.
Large Scale Spectral Clustering with Landmark-Based Representation
This paper proposes a novel approach, called Landmark-based Spectral Clustering (LSC), for large scale clustering problems, where the original data points are represented as the linear combinations of landmarks and the spectral embedding of the data can be efficiently computed with the landmark-based representation.
Visualizing and Understanding Neural Models in NLP
Four strategies for visualizing compositionality in neural models for NLP, inspired by similar work in computer vision, including LSTM-style gates that measure information flow and gradient back-propagation, are described.
The Never-Ending Language Learner is described, which achieves some of the desired properties of a never-ending learner, and lessons learned are discussed.
Towards VQA Models That Can Read
- Amanpreet Singh, Vivek Natarajan, Marcus Rohrbach
- Computer ScienceIEEE/CVF Conference on Computer Vision and…
- 18 April 2019
A novel model architecture is introduced that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the images.
NEIL: Extracting Visual Knowledge from Web Data
- Xinlei Chen, Abhinav Shrivastava, A. Gupta
- Computer ScienceIEEE International Conference on Computer Vision
- 1 December 2013
NEIL (Never Ending Image Learner), a computer program that runs 24 hours per day and 7 days per week to automatically extract visual knowledge from Internet data, is proposed in an attempt to develop the world's largest visual structured knowledge base with minimum human labeling effort.
Webly Supervised Learning of Convolutional Networks
This work uses easy images to train an initial visual representation and uses this initial CNN to adapt it to harder, more realistic images by leveraging the structure of data and categories, and demonstrates the strength of webly supervised learning by localizing objects in web images and training a R-CNN style detector.