Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension

  title={Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension},
  author={Kai Sun and Dian Yu and Dong Yu and Claire Cardie},
  journal={Transactions of the Association for Computational Linguistics},
  • Kai SunDian Yu Claire Cardie
  • Published 21 April 2019
  • Computer Science
  • Transactions of the Association for Computational Linguistics
Machine reading comprehension tasks require a machine reader to answer questions relevant to the given document. In this paper, we present the first free-form multiple-Choice Chinese machine reading Comprehension dataset (C3), containing 13,369 documents (dialogues or more formally written mixed-genre texts) and their associated 19,577 multiple-choice free-form questions collected from Chinese-as-a-second-language examinations. We present a comprehensive analysis of the prior knowledge (i.e… 

Enhancing Lexical-Based Approach With External Knowledge for Vietnamese Multiple-Choice Machine Reading Comprehension

This work constructs a dataset which consists of 2,783 pairs of multiple-choice questions and answers based on 417 Vietnamese texts and proposes a lexical-based MRC method that utilizes semantic similarity measures and external knowledge sources to analyze questions and extract answers from the given text.

ExpMRC: explainability evaluation for machine reading comprehension

A Survey on Machine Reading Comprehension Systems

It is demonstrated that the focus of research has changed in recent years from answer extraction to answer generation, from single- to multi-document reading comprehension, and from learning from scratch to using pre-trained word vectors.

Unsupervised Explanation Generation for Machine Reading Comprehension

A self-explainable framework for the machine reading comprehension task that tries to use less passage information and achieve similar results compared to the system that uses the whole passage, while the filtered passage will be used as explanations.

Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge

This paper aims to extract a new kind of structured knowledge from scripts and use it to improve MRC, and designs a teacher-student paradigm with multiple teachers to facilitate the transfer of knowledge in weakly-labeled MRC data.

Native Chinese Reader: A Dataset Towards Native-Level Chinese Machine Reading Comprehension

The present Native Chinese Reader (NCR) is a new machine reading comprehension (MRC) dataset with particularly long articles in both modern and classical Chinese, which indicates a performance gap between current MRC models and native Chinese speakers.

GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Evaluation

This paper proposes GCRC, a new dataset with challenging and high-quality multi-choice questions, collected from Gaokao Chinese (Chinese subject from the National College Entrance Examination of China), and shows that the proposed dataset is more challenging and very useful for identifying the limitations of existing MRC systems in an explainable way.

Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question Answering Data

A self-teaching paradigm is proposed to better use the generated weakly-labeled MRC instances to improve a target MRC task and the effectiveness of this framework and the usefulness of large-scale subjectarea question-answering data for machine reading comprehension are demonstrated.

A Chinese Machine Reading Comprehension Dataset Automatic Generated Based on Knowledge Graph

This paper proposes an automatic method for MRC dataset generation and builds the largest Chinese medical reading comprehension dataset presently named CMedRC, which contains 17k questions generated by the auto-matic method and some seed questions.

Contrastive Learning between Classical and Modern Chinese for Classical Chinese Machine Reading Comprehension

A contrastive learning method between classical and modern Chinese in order to reach a deep understanding of the two different styles and improves language understanding ability and outperforms existing PLMs on the Haihua, CCLUE, and ChID datasets.



Improving Machine Reading Comprehension with General Reading Strategies

Three general strategies aimed to improve non-extractive machine reading comprehension (MRC) are proposed and the effectiveness of these proposed strategies and the versatility and general applicability of fine-tuned models that incorporate these strategies are demonstrated.

DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension

Experimental results on the DREAM data set show the effectiveness of dialogue structure and general world knowledge, the first dialogue-based multiple-choice reading comprehension data set to focus on in-depth multi-turn multi-party dialogue understanding.

A Span-Extraction Dataset for Chinese Machine Reading Comprehension

This paper introduces a Span-Extraction dataset for Chinese machine reading comprehension to add language diversities in this area and hosted the Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018).

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

This paper introduces DuReader, a new large-scale, open-domain Chinese machine reading comprehension (MRC) dataset, designed to address real-world MRC, and organizes a shared competition to encourage the exploration of more models.

Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences

The dataset is the first to study multi-sentence inference at scale, with an open-ended set of question types that requires reasoning skills, and finds human solvers to achieve an F1-score of 88.1%.

Broad Context Language Modeling as Reading Comprehension

This work views LAMBADA as a reading comprehension problem and applies comprehension models based on neural networks, finding that neural network readers perform well in cases that involve selecting a name from the context based on dialogue or discourse cues but struggle when coreference resolution or external knowledge is needed.

MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers.

The NarrativeQA Reading Comprehension Challenge

A new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts are presented, designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience.

RACE: Large-scale ReAding Comprehension Dataset From Examinations

The proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models and the ceiling human performance.