• Publications
  • Influence
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
A new benchmark styled after GLUE is presented, a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard are presented. Expand
Neural Network Acceptability Judgments
This paper introduces the Corpus of Linguistic Acceptability (CoLA), a set of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature, and trains several recurrent neural network models on acceptability classification, and finds that the authors' models outperform unsupervised models by Lau et al. (2016) on CoLA. Expand
Towards VQA Models That Can Read
A novel model architecture is introduced that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the images. Expand
Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA
A novel model is proposed based on a multimodal transformer architecture accompanied by a rich representation for text in images that enables iterative answer decoding with a dynamic pointer network, allowing the model to form an answer through multi-step prediction instead of one-step classification. Expand
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodalExpand
TextCaps: a Dataset for Image Captioning with Reading Comprehension
A novel dataset, TextCaps, with 145k captions for 28k images, challenges a model to recognize text, relate it to its visual context, and decide what part of the text to copy or paraphrase, requiring spatial, semantic, and visual reasoning between multiple text tokens and visual entities, such as objects. Expand
Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks
This paper presents Individualized Controlled Continuous Communication Model (IC3Net) which has better training efficiency than simple continuous communication model, and can be applied to semi-cooperative and competitive settings along with the cooperative settings. Expand
Pythia-A platform for vision & language research
This paper presents Pythia, a deep learning research platform for vision & language tasks. Pythia is built with a plug-&-play strategy at its core, which enables researchers to quickly build,Expand