• Publications
  • Influence
CLUE: A Chinese Language Understanding Evaluation Benchmark
TLDR
The first large-scale Chinese Language Understanding Evaluation (CLUE) benchmark is introduced, an open-ended, community-driven project that brings together 9 tasks spanning several well-established single-sentence/sentence-pair classification tasks, as well as machine reading comprehension, all on original Chinese text. Expand
Probing Natural Language Inference Models through Semantic Fragments
TLDR
This work proposes the use of semantic fragments---systematically generated datasets that each target a different semantic phenomenon---for probing, and efficiently improving, such capabilities of linguistic models. Expand
Transformers as Soft Reasoners over Language
TLDR
This work trains transformers to reason (or emulate reasoning) over natural language sentences using synthetically generated data, thus bypassing a formal representation and suggesting a new role for transformers, namely as limited "soft theorem provers" operating over explicit theories in language. Expand
MonaLog: a Lightweight System for Natural Language Inference Based on Monotonicity
TLDR
It is shown that MonaLog is capable of generating large amounts of high-quality training data for BERT, improving its accuracy on SICK and used in combination with the current state-of-the-art model BERT in a variety of settings, including for compositional data augmentation. Expand
From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project
TLDR
Unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90% on the exam's non-diagram, multiple choice (NDMC) questions, demonstrates that modern NLP methods can result in mastery on this task. Expand
OCNLI: Original Chinese Natural Language Inference
TLDR
This paper presents the first large-scale NLI dataset for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI), which follows closely the annotation protocol used for MNLI, but creates new strategies for eliciting diverse hypotheses. Expand
What Does My QA Model Know? Devising Controlled Probes Using Expert Knowledge
TLDR
A methodology for automatically building probe datasets from expert knowledge sources, allowing for systematic control and a comprehensive evaluation, and confirms that transformer-based multiple-choice QA models are already predisposed to recognize certain types of structural linguistic knowledge. Expand
Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation
TLDR
It is found that models trained on general-purpose NLI datasets fail systematically on MoNLI examples containing negation, but that MoNNI fine-tuning addresses this failure, suggesting that the BERT model at least partially embeds a theory of lexical entailment and negation at an algorithmic level. Expand
Temporal Reasoning on Implicit Events from Distant Supervision
TLDR
A neuro-symbolic temporal reasoning model, SymTime, is proposed, which exploits distant supervision signals from large-scale text and uses temporal rules to combine start times and durations to infer end times and generalizes to other temporal reasoning tasks. Expand
A Dataset for Tracking Entities in Open Domain Procedural Text
TLDR
This work presents the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary, and creates OPENPI, a high-quality, large-scale dataset comprising 29,928 state changes over 4,050 sentences from 810 procedural real-world paragraphs from WikiHow.com. Expand
...
1
2
3
4
...