• Publications
  • Influence
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
TLDR
The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus. Expand
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
TLDR
A new benchmark styled after GLUE is presented, a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard are presented. Expand
ListOps: A Diagnostic Dataset for Latent Tree Learning
TLDR
It is shown that the current leading latent tree models are unable to learn to parse and succeed at ListOps, a toy dataset created to study the parsing ability of latentTree models. Expand
The RepEval 2017 Shared Task: Multi-Genre Natural Language Inference with Sentence Representations
TLDR
The results of the RepEval 2017 Shared Task were fairly consistent across the genre-matched and genre-mismatched test sets, and across subsets of the test data representing a variety of linguistic phenomena, suggesting that all of the submitted systems learned reasonably domain-independent representations for sentence meaning. Expand
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
TLDR
It is found that all three of the widely-used MLMs the authors evaluate substantially favor sentences that express stereotypes in every category in CrowS-Pairs, a benchmark for measuring some forms of social bias in language models against protected demographic groups in the US. Expand
Natural Language Understanding with the Quora Question Pairs Dataset
TLDR
This paper explores the task Natural Language Understanding by looking at duplicate question detection in the Quora dataset and found that a simple Continuous Bag of Words neural network model had the best performance, outdoing more complicated recurrent and attention based models. Expand
Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark
TLDR
It is concluded that low-resource sentence classification remains a challenge for modern neural network approaches to text understanding using the BERT model in limited-data regimes. Expand
Latent Structure Models for Natural Language Processing
TLDR
This tutorial will cover recent advances in discrete latent structure models, discussing their motivation, potential, and limitations, then exploring in detail three strategies for designing such models: gradient approximation, reinforcement learning, and end-to-end differentiable methods. Expand
What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?
TLDR
It is found that asking workers to write explanations for their examples is an ineffective stand-alone strategy for boosting NLU example difficulty and that training crowdworkers, and then using an iterative process of collecting data, sending feedback, and qualifying workers based on expert judgments is an effective means of collecting challenging data. Expand
...
1
2
...