• Publications
  • Influence
A large annotated corpus for learning natural language inference
TLDR
The Stanford Natural Language Inference corpus is introduced, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning, which allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time. Expand
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
TLDR
The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus. Expand
Generating Sentences from a Continuous Space
TLDR
This work introduces and study an RNN-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences that allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features. Expand
XNLI: Evaluating Cross-lingual Sentence Representations
TLDR
This work constructs an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus to 14 languages, including low-resource languages such as Swahili and Urdu and finds that XNLI represents a practical and challenging evaluation suite and that directly translating the test data yields the best performance among available baselines. Expand
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
TLDR
A new benchmark styled after GLUE is presented, a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard are presented. Expand
Annotation Artifacts in Natural Language Inference Data
TLDR
It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes. Expand
Neural Network Acceptability Judgments
TLDR
This paper introduces the Corpus of Linguistic Acceptability (CoLA), a set of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature, and trains several recurrent neural network models on acceptability classification, and finds that the authors' models outperform unsupervised models by Lau et al. (2016) on CoLA. Expand
A Gold Standard Dependency Corpus for English
TLDR
It is shown that training a dependency parser on a mix of newswire and web data leads to better performance on that type of data without hurting performance on newswire text, and therefore gold standard annotations for non-canonical text can be a valuable resource for parsing. Expand
What do you learn from context? Probing for sentence structure in contextualized word representations
TLDR
A novel edge probing task design is introduced and a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline are constructed to investigate how sentence structure is encoded across a range of syntactic, semantic, local, and long-range phenomena. Expand
...
1
2
3
4
5
...