• Publications
  • Influence
Annotation Artifacts in Natural Language Inference Data
It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes. Expand
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
This paper introduces the task of grounded commonsense inference, unifying natural language inference and commonsense reasoning, and proposes Adversarial Filtering (AF), a novel procedure that constructs a de-biased dataset by iteratively training an ensemble of stylistic classifiers, and using them to filter the data. Expand
Knowledge Enhanced Contextual Word Representations
After integrating WordNet and a subset of Wikipedia into BERT, the knowledge enhanced BERT (KnowBert) demonstrates improved perplexity, ability to recall facts as measured in a probing task and downstream performance on relationship extraction, entity typing, and word sense disambiguation. Expand
Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets
This work introduces inoculation by fine-tuning, a new analysis method for studying challenge datasets by exposing models to a small amount of data from the challenge dataset (a metaphorical pathogen) and assessing how well they can adapt. Expand
Authorship Attribution of Micro-Messages
The concept of an author’s unique “signature” is introduced, and it is shown that such signatures are typical of many authors when writing very short texts. Expand
A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications
The first public dataset of scientific peer reviews available for research purposes (PeerRead v1) is presented and it is shown that simple models can predict whether a paper is accepted with up to 21% error reduction compared to the majority baseline. Expand
Learnability-Based Syntactic Annotation Design
This work presents a methodology for syntactic selection and applies it to six central dependency structures, comparing pairs of annotation schemes that differ in the annotation of a single structure and finds that in three of the structures, one annotation is unequivocally better than the alternatives. Expand
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
This work investigates how the performance of the best-found model varies as a function of the number of fine-tuning trials, and examines two factors influenced by the choice of random seed: weight initialization and training data order. Expand
Show Your Work: Improved Reporting of Experimental Results
It is demonstrated that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best, and a novel technique is presented: expected validation performance of the best-found model as a function of computation budget. Expand
Green AI
Creating efficiency in AI research will decrease its carbon footprint and increase its inclusivity as deep learning study should not require the deepest pockets.