• Publications
  • Influence
A Simple Method for Commonsense Reasoning
TLDR
Key to this method is the use of language models, trained on a massive amount of unlabled data, to score multiple choice questions posed by commonsense reasoning tests, which outperform previous state-of-the-art methods by a large margin. Expand
Learning Longer-term Dependencies in RNNs with Auxiliary Losses
TLDR
This paper proposes a simple method that improves the ability to capture long term dependencies in RNNs by adding an unsupervised auxiliary loss to the original objective, making truncated backpropagation feasible for long sequences and also improving full BPTT. Expand
Selfie: Self-supervised Pretraining for Image Embedding
TLDR
The pretraining technique called Selfie, which stands for SELFie supervised Image Embedding, generalizes the concept of masked language modeling of BERT to continuous data, such as images, by making use of the Contrastive Predictive Coding loss. Expand
Attention Pooling A Softmax-Cross Entropy with true label � Distractor Patch
We introduce a pretraining technique called Selfie, which stands for SELFsupervised Image Embedding. Selfie generalizes the concept of masked language modeling to continuous data, such as images.Expand
Do Language Models Have Common Sense
It has been argued that current machine learning models do not have common sense, and therefore must be hard-coded with prior knowledge (Marcus, 2018). Here we show surprising evidence that languageExpand