• Publications
  • Influence
Annotation Artifacts in Natural Language Inference Data
TLDR
Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to. Expand
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
TLDR
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks, showing that a second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains, under both high- and low-resource settings. Expand
Show Your Work: Improved Reporting of Experimental Results
TLDR
We present a novel technique for reporting validation performance of the best-found model as a function of computation budget (i.e., the number of hyperparameter search trials). Expand
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
TLDR
We investigate the extent to which language models can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration. Expand
Variational Pretraining for Semi-supervised Text Classification
TLDR
We introduce VAMPIRE, a lightweight pretraining framework for effective text classification when data and computing resources are limited, without the need for computationally demanding sequence-based models. Expand
Analysis of Graph Invariants in Functional Neocortical Circuitry Reveals Generalized Features Common to Three Areas of Sensory Cortex
TLDR
Using the lagged correlation of spiking activity between neurons, we generated functional wiring diagrams to gain insight into the underlying neocortical circuitry. Expand
Emergent coordination underlying learning to reach to grasp with a brain-machine interface.
TLDR
The development of coordinated reach-to-grasp movement has been well studied in infants and children. Expand
Polyglot Text Classification with Neural Document Models
TLDR
We combine a generative, neural document model (Card et al, 2018) and multilingual word vectors (Ammar et al., 2016) to perform text classification on documents in eight languages. Expand
Detoxifying Language Models Risks Marginalizing Minority Voices
TLDR
We show that detoxification makes LMs more brittle to distribution shift, especially on language used by marginalized groups (e.g., African-American English and minority identity mentions). Expand