• Publications
  • Influence
WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale
TLDR
This work introduces WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset, and establishes new state-of-the-art results on five related benchmarks.
Ground Truth for Grammatical Error Correction Metrics
TLDR
The first human evaluation of GEC system outputs is conducted, and it is shown that the rankings produced by metrics such as MaxMatch and I-measure do not correlate well with this ground truth.
JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction
TLDR
A new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG), which represents a broad range of language proficiency levels and uses holistic fluency edits to not only correct grammatical errors but also make the original text more native sounding.
Abductive Commonsense Reasoning
TLDR
This study introduces a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations, and conceptualizes two new tasks -- Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and Abduction NLG: a conditional generation task for explaining given observations in natural language.
COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs
TLDR
It is proposed that manually constructed CSKGs will never achieve the coverage necessary to be applicable in all situations encountered by NLP agents, and a new evaluation framework for testing the utility of KGs based on how effectively implicit knowledge representations can be learned from them is proposed.
Universal Decompositional Semantics on Universal Dependencies
TLDR
A framework for augmenting data sets from the Universal Dependencies project with Universal Decompositional Semantics, and describes results from annotating the English Universal Dependency treebank, dealing with word senses, semantic roles, and event properties.
There’s No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction
TLDR
It is shown that reference-less grammaticality metrics correlate very strongly with human judgments and are competitive with the leading reference-based evaluation metrics.
Robsut Wrod Reocginiton via Semi-Character Recurrent Neural Network
TLDR
Inspired by the findings from the Cmabrigde Uinervtisy effect, a word recognition model based on a semi-character level recurrent neural network (scRNN) is proposed that has significantly more robust performance in word spelling correction compared to existing spelling checkers and character-based convolutional neural network.
Grammatical Error Correction with Neural Reinforcement Learning
TLDR
It is demonstrated that NRL outperforms MLE both in human and automated evaluation metrics, achieving the state-of-the-art on a fluency-oriented GEC corpus.
Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality
TLDR
It is shown that automatic evaluation with the authors' new annotation scheme has very strong correlation with expert rankings, and it is advocated for a fundamental and necessary shift in the goal of GEC, from correcting small, labeled error types, to producing text that has native fluency.
...
1
2
3
4
...