DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension

  title={DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension},
  author={Amrita Saha and Rahul Aralikatte and Mitesh M. Khapra and Karthik Sankaranarayanan},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
We propose DuoRC, a novel dataset for Reading Comprehension (RC) that motivates several new challenges for neural approaches in language understanding beyond those offered by existing RC datasets. DuoRC contains 186,089 unique question-answer pairs created from a collection of 7680 pairs of movie plots where each pair in the collection reflects two versions of the same movie - one from Wikipedia and the other from IMDb - written by two different authors. We asked crowdsourced workers to create… 

Figures and Tables from this paper

Multi-style Generative Reading Comprehension

This study tackles generative reading comprehension (RC), which consists of answering questions based on textual evidence and natural language generation (NLG). We propose a multi-style abstractive

IIRC: A Dataset of Incomplete Information Reading Comprehension Questions

A dataset with more than 13K questions over paragraphs from English Wikipedia that provide only partial information to answer them, with the missing information occurring in one or more linked documents, finding that it achieves 31.1% F1 on this task, while estimated human performance is 88.4%.

Comprehensive Multi-Dataset Evaluation of Reading Comprehension

An evaluation server, ORB, is presented, that reports performance on seven diverse reading comprehension datasets, encouraging and facilitating testing a single model’s capability in understanding a wide variety of reading phenomena.

What Makes Reading Comprehension Questions Easier?

This study proposes to employ simple heuristics to split each dataset into easy and hard subsets and examines the performance of two baseline models for each of the subsets, and observes that the baseline performances for thehard subsets remarkably degrade compared to those of entire datasets.

A Survey on Machine Reading Comprehension Systems

It is demonstrated that the focus of research has changed in recent years from answer extraction to answer generation, from single- to multi-document reading comprehension, and from learning from scratch to using pre-trained word vectors.

Generalizing Question Answering System with Pre-trained Language Model Fine-tuning

A multi-task learning framework that learns the shared representation across different tasks, built on top of a large pre-trained language model, and then fine-tuned on multiple RC datasets is proposed.

ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension

An evaluation server, ORB, is presented, that reports performance on seven diverse reading comprehension datasets, encouraging and facilitating testing a single model's capability in understanding a wide variety of reading phenomena.

Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond

It is arrived at that 1) MRC boosts the progress from language processing to understanding; 2) the rapid improvement of MRC systems greatly benefits from the development of CLMs; 3) the theme of M RC is gradually moving from shallow text matching to cognitive reasoning.

Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets

Results suggest that most of the questions already answered correctly by the model do not necessarily require grammatical and complex reasoning, and therefore, MRC datasets will need to take extra care in their design to ensure that questions can correctly evaluate the intended skills.

DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs

A new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs, and presents a new model that combines reading comprehension methods with simple numerical reasoning to achieve 51% F1.

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

This new dataset is aimed to overcome a number of well-known weaknesses of previous publicly available datasets for the same task of reading comprehension and question answering, and is the most comprehensive real-world dataset of its kind in both quantity and quality.

S-Net: From Answer Extraction to Answer Generation for Machine Reading Comprehension

The answer extraction model is first employed to predict the most important sub-spans from the passage as evidence, and the answer synthesis model takes the evidence as additional features along with the question and passage to further elaborate the final answers.

The NarrativeQA Reading Comprehension Challenge

A new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts are presented, designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience.

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers.

MovieQA: Understanding Stories in Movies through Question-Answering

The MovieQA dataset, which aims to evaluate automatic story comprehension from both video and text, is introduced and existing QA techniques are extended to show that question-answering with such open-ended semantics is hard.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.

Who did What: A Large-Scale Person-Centered Cloze Dataset

A new "Who-did-What" dataset of over 200,000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus is constructed and proposed as a challenge task for the community.

NewsQA: A Machine Comprehension Dataset

NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs, is presented and analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment.

A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories

A new framework for evaluating story understanding and script learning: the 'Story Cloze Test', which requires a system to choose the correct ending to a four-sentence story, and a new corpus of ~50k five- Sentence commonsense stories, ROCStories, to enable this evaluation.