Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension

@article{Sun2020InvestigatingPK,
  title={Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension},
  author={Kai Sun and Dian Yu and Dong Yu and Claire Cardie},
  journal={Transactions of the Association for Computational Linguistics},
  year={2020},
  volume={8},
  pages={141-155}
}
  • Kai Sun, Dian Yu, Claire Cardie
  • Published 21 April 2019
  • Computer Science
  • Transactions of the Association for Computational Linguistics
Machine reading comprehension tasks require a machine reader to answer questions relevant to the given document. In this paper, we present the first free-form multiple-Choice Chinese machine reading Comprehension dataset (C3), containing 13,369 documents (dialogues or more formally written mixed-genre texts) and their associated 19,577 multiple-choice free-form questions collected from Chinese-as-a-second-language examinations. We present a comprehensive analysis of the prior knowledge (i.e… 

Enhancing Lexical-Based Approach With External Knowledge for Vietnamese Multiple-Choice Machine Reading Comprehension

TLDR
This work constructs a dataset which consists of 2,783 pairs of multiple-choice questions and answers based on 417 Vietnamese texts and proposes a lexical-based MRC method that utilizes semantic similarity measures and external knowledge sources to analyze questions and extract answers from the given text.

ExpMRC: explainability evaluation for machine reading comprehension

Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge

TLDR
This paper aims to extract a new kind of structured knowledge from scripts and use it to improve MRC, and designs a teacher-student paradigm with multiple teachers to facilitate the transfer of knowledge in weakly-labeled MRC data.

Native Chinese Reader: A Dataset Towards Native-Level Chinese Machine Reading Comprehension

TLDR
The present Native Chinese Reader (NCR) is a new machine reading comprehension (MRC) dataset with particularly long articles in both modern and classical Chinese, which indicates a performance gap between current MRC models and native Chinese speakers.

GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Evaluation

TLDR
This paper proposes GCRC, a new dataset with challenging and high-quality multi-choice questions, collected from Gaokao Chinese (Chinese subject from the National College Entrance Examination of China), and shows that the proposed dataset is more challenging and very useful for identifying the limitations of existing MRC systems in an explainable way.

Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question Answering Data

TLDR
A self-teaching paradigm is proposed to better use the generated weakly-labeled MRC instances to improve a target MRC task and the effectiveness of this framework and the usefulness of large-scale subjectarea question-answering data for machine reading comprehension are demonstrated.

A Chinese Machine Reading Comprehension Dataset Automatic Generated Based on Knowledge Graph

TLDR
This paper proposes an automatic method for MRC dataset generation and builds the largest Chinese medical reading comprehension dataset presently named CMedRC, which contains 17k questions generated by the auto-matic method and some seed questions.

Contrastive Learning between Classical and Modern Chinese for Classical Chinese Machine Reading Comprehension

TLDR
A contrastive learning method between classical and modern Chinese in order to reach a deep understanding of the two different styles and improves language understanding ability and outperforms existing PLMs on the Haihua, CCLUE, and ChID datasets.

Study on Text Comprehension and MCQA in Spanish

TLDR
This work explores the use of encoder models, generative models, clue generation systems and dataset expansion for text comprehension and MCQA task in ReCoRES: Reading Comprehension and Reasoning Explanation for Spanish.

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

TLDR
Experimental results demonstrate that pre-training models using the proposed approach followed by fine-tuning achieve significant improvements over previous state-of-the-art models on two commonsense-related benchmarks, including CommonsenseQA and Winograd Schema Challenge.

References

SHOWING 1-10 OF 73 REFERENCES

Improving Machine Reading Comprehension with General Reading Strategies

TLDR
Three general strategies aimed to improve non-extractive machine reading comprehension (MRC) are proposed and the effectiveness of these proposed strategies and the versatility and general applicability of fine-tuned models that incorporate these strategies are demonstrated.

DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension

TLDR
Experimental results on the DREAM data set show the effectiveness of dialogue structure and general world knowledge, the first dialogue-based multiple-choice reading comprehension data set to focus on in-depth multi-turn multi-party dialogue understanding.

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

TLDR
This paper introduces DuReader, a new large-scale, open-domain Chinese machine reading comprehension (MRC) dataset, designed to address real-world MRC, and organizes a shared competition to encourage the exploration of more models.

Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences

TLDR
The dataset is the first to study multi-sentence inference at scale, with an open-ended set of question types that requires reasoning skills, and finds human solvers to achieve an F1-score of 88.1%.

Broad Context Language Modeling as Reading Comprehension

TLDR
This work views LAMBADA as a reading comprehension problem and applies comprehension models based on neural networks, finding that neural network readers perform well in cases that involve selecting a name from the context based on dialogue or discourse cues but struggle when coreference resolution or external knowledge is needed.

MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

TLDR
MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.

The NarrativeQA Reading Comprehension Challenge

TLDR
A new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts are presented, designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience.

RACE: Large-scale ReAding Comprehension Dataset From Examinations

TLDR
The proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models and the ceiling human performance.

Constructing Datasets for Multi-hop Reading Comprehension Across Documents

TLDR
A novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods, in which a model learns to seek and combine evidence — effectively performing multihop, alias multi-step, inference.

QuAC: Question Answering in Context

TLDR
QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as it shows in a detailed qualitative evaluation.
...