• Publications
  • Influence
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
TLDR
A new benchmark styled after GLUE is presented, a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard are presented. Expand
What do you learn from context? Probing for sentence structure in contextualized word representations
TLDR
A novel edge probing task design is introduced and a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline are constructed to investigate how sentence structure is encoded across a range of syntactic, semantic, local, and long-range phenomena. Expand
On Measuring Social Biases in Sentence Encoders
TLDR
The Word Embedding Association Test is extended to measure bias in sentence encoders and mixed results including suspicious patterns of sensitivity that suggest the test’s assumptions may not hold in general. Expand
Asking and Answering Questions to Evaluate the Factual Consistency of Summaries
TLDR
QAGS (pronounced “kags”), an automatic evaluation protocol that is designed to identify factual inconsistencies in a generated summary, is proposed and is believed to be a promising tool in automatically generating usable and factually consistent text. Expand
BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model
TLDR
It is shown that BERT (Devlin et al., 2018) is a Markov random field language model, and this formulation gives way to a natural procedure to sample sentences from BERT, which can produce high quality, fluent generations. Expand
Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling
TLDR
The first large-scale systematic study of candidate pretraining tasks, comparing 19 different tasks both as alternatives and complements to language modeling shows primary results support the use language modeling, especially when combined with pretraining on additional labeled-data tasks. Expand
jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models
TLDR
Jiant is introduced, an open source toolkit for conducting multitask and transfer learning experiments on English NLU tasks and it is demonstrated that jiant reproduces published performance on a variety of tasks and models. Expand
Probing What Different NLP Tasks Teach Machines about Function Word Comprehension
TLDR
The results show that pretraining on CCG—the authors' most syntactic objective—performs the best on average across their probing tasks, suggesting that syntactic knowledge helps function word comprehension. Expand
Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling
TLDR
The primary results support the use of language modeling as a pretraining task and set a new state of the art among comparable models using multitask learning with language models and suggest that the widely-used paradigm of pretraining and freezing sentence encoders may not be an ideal platform for further work. Expand
...
1
2
...