• Publications
  • Influence
What Makes Reading Comprehension Questions Easier?
TLDR
This study proposes to employ simple heuristics to split each dataset into easy and hard subsets and examines the performance of two baseline models for each of the subsets, and observes that the baseline performances for thehard subsets remarkably degrade compared to those of entire datasets.
Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets
TLDR
Results suggest that most of the questions already answered correctly by the model do not necessarily require grammatical and complex reasoning, and therefore, MRC datasets will need to take extra care in their design to ensure that questions can correctly evaluate the intended skills.
Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps
TLDR
This study presents a new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and unstructured data and introduces the evidence information containing a reasoning path forMulti-hop questions, and demonstrates that the dataset is challenging formulti-hop models and it ensures that multi-Hop reasoning is required.
Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability
TLDR
The dataset analysis suggests that the readability of RC datasets does not directly affect the question difficulty and that it is possible to create an RC dataset that is easy to read but difficult to answer.
An Analysis of Prerequisite Skills for Reading Comprehension
TLDR
A methodology for examining RC systems from multiple viewpoints is proposed, which defines a set of basic skills used for RC, manually annotate questions of an existing RC task, and shows the performances for each skill of existing systems that have been proposed for the task.
Prerequisite Skills for Reading Comprehension: Multi-Perspective Analysis of MCTest Datasets and Systems
TLDR
A methodology inspired by unit testing in software engineering that enables the examination of RC systems from multiple aspects and concludes that the set of prerequisite skills defined are promising for the decomposition and analysis of RC.
Embracing Ambiguity: Shifting the Training Target of NLI Models
TLDR
This paper prepares AmbiNLI, a trial dataset obtained from readily available sources, and shows it is possible to reduce ChaosNLI divergence scores when finetuning on this data, a promising first step towards learning how to capture linguistic ambiguity.
What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?
TLDR
It is found that asking workers to write explanations for their examples is an ineffective stand-alone strategy for boosting NLU example difficulty and that training crowdworkers, and then using an iterative process of collecting data, sending feedback, and qualifying workers based on expert judgments is an effective means of collecting challenging data.
Prerequisites for Explainable Machine Reading Comprehension: A Position Paper
TLDR
It is concluded that future datasets should evaluate the capability of the model for constructing a coherent and grounded representation to understand context-dependent situations and ensure substantive validity by improving the question quality and by formulating a white-box task.
Benchmarking Machine Reading Comprehension: A Psychological Perspective
TLDR
It is concluded that future datasets should evaluate the capability of the model for constructing a coherent and grounded representation to understand context-dependent situations and ensure substantive validity by shortcut-proof questions and explanation as a part of the task design.
...
1
2
...