• Publications
  • Influence
Neural Network Acceptability Judgments
TLDR
This paper introduces the Corpus of Linguistic Acceptability (CoLA), a set of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature, and trains several recurrent neural network models on acceptability classification, and finds that the authors' models outperform unsupervised models by Lau et al. (2016) on CoLA.
BLiMP: A Benchmark of Linguistic Minimal Pairs for English
TLDR
The Benchmark of Linguistic Minimal Pairs, a challenge set for evaluating the linguistic knowledge of language models (LMs) on major grammatical phenomena in English, finds that state-of-the-art models identify morphological contrasts related to agreement reliably, but they struggle with some subtle semantic and syntactic phenomena.
Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)
TLDR
A new English-language diagnostic set called MSGS (the Mixed Signals Generalization Set), which consists of 20 ambiguous binary classification tasks that are used to test whether a pretrained model prefers linguistic or surface generalizations during fine-tuning, finds that models can learn to represent linguistic features with little pretraining data, but require far more data to learn to prefer linguistic generalizations over surface ones.
When Do You Need Billions of Words of Pretraining Data?
TLDR
While the ability to encode linguistic features is almost certainly necessary for language understanding, it is likely that other, unidentified, forms of knowledge are the major drivers of recent improvements in language understanding among large pretrained models.
Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs
TLDR
It is concluded that a variety of methods is necessary to reveal all relevant aspects of a model’s grammatical knowledge in a given domain.
Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition
TLDR
It is found that BERT learns to draw pragmatic inferences, and NLI training encourages models to learn some, but not all, pragmaticinferences.
Can neural networks acquire a structural bias from raw linguistic data?
TLDR
This work finds that BERT makes a structural generalization in 3 out of 4 empirical domains---subject-auxiliary inversion, reflexive binding, and verb tense detection in embedded clauses---but makes a linear generalization when tested on NPI licensing, suggesting tentative evidence that some linguistic universals can be acquired by learners without innate biases.
Verb Argument Structure Alternations in Word and Sentence Embeddings
TLDR
The authors' models perform reliable classifications for some verbal alternations but not others, suggesting that while these representations do encode fine-grained lexical information, it is incomplete or can be hard to extract.
Grammatical Analysis of Pretrained Sentence Encoders with Acceptability Judgments
TLDR
A grammatically annotated development set for the Corpus of Linguistic Acceptability (CoLA; Warstadt et al., 2018) is introduced, which is used to investigate the grammatical knowledge of three pretrained encoders, including the popular OpenAI Transformer and BERT.
What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?
TLDR
It is found that asking workers to write explanations for their examples is an ineffective stand-alone strategy for boosting NLU example difficulty and that training crowdworkers, and then using an iterative process of collecting data, sending feedback, and qualifying workers based on expert judgments is an effective means of collecting challenging data.
...
1
2
...