• Corpus ID: 1289517

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

@article{Campos2016MSMA,
  title={MS MARCO: A Human Generated MAchine Reading COmprehension Dataset},
  author={Daniel Fernando Campos and Tri Nguyen and Mir Rosenberg and Xia Song and Jianfeng Gao and Saurabh Tiwary and Rangan Majumder and Li Deng and Bhaskar Mitra},
  journal={ArXiv},
  year={2016},
  volume={abs/1611.09268}
}
This paper presents our recent work on the design and development of a new, large scale dataset, which we name MS MARCO, for MAchine Reading COmprehension. This new dataset is aimed to overcome a number of well-known weaknesses of previous publicly available datasets for the same task of reading comprehension and question answering. In MS MARCO, all questions are sampled from real anonymized user queries. The context passages, from which answers in the dataset are derived, are extracted from… 

Figures and Tables from this paper

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications
TLDR
This paper introduces DuReader, a new large-scale, open-domain Chinese machine reading comprehension (MRC) dataset, designed to address real-world MRC, and organizes a shared competition to encourage the exploration of more models.
Quasar: Datasets for Question Answering by Search and Reading
We present two new large-scale datasets aimed at evaluating systems designed to comprehend a natural language query and extract its answer from a large corpus of text. The Quasar-S dataset consists
Improved Machine Reading Comprehension Using Data Validation for Weakly Labeled Data
TLDR
The proposed MRC model can address the limitation of irrelevant context in MRC better than the human supervision, and showed a 4.33% improvement in performance for the TriviaQA Wiki, compared to the existing baseline model.
UQuAD1.0: Development of an Urdu Question Answering Training Data for Machine Reading Comprehension
TLDR
This work explores the semi-automated creation of the Urdu Question Answering Dataset (UQuAD1.0) by combining machine-translated SQuAD with human-generated samples derived from Wikipedia articles and Urdu RC worksheets from Cambridge O-level books.
S-Net: From Answer Extraction to Answer Generation for Machine Reading Comprehension
TLDR
The answer extraction model is first employed to predict the most important sub-spans from the passage as evidence, and the answer synthesis model takes the evidence as additional features along with the question and passage to further elaborate the final answers.
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine
TLDR
It is shown that there is a meaningful gap between the human and machine performances, which suggests that the proposed dataset could well serve as a benchmark for question-answering.
ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion
TLDR
Quality analysis demonstrates the challenge of ReCO that it requires various types of reasoning skills such as causal inference, logical reasoning, etc to be a good challenge for machine reading comprehension.
A Span-Extraction Dataset for Chinese Machine Reading Comprehension
TLDR
This paper introduces a Span-Extraction dataset for Chinese machine reading comprehension to add language diversities in this area and hosted the Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018).
S-Net: From Answer Extraction to Answer Synthesis for Machine Reading Comprehension
TLDR
An extraction-then-synthesis framework to synthesize answers from extraction results, based on the sequence-to-sequence neural networks with extracted evidences as features, which outperforms state-of-the-art methods.
A Survey on Machine Reading Comprehension Systems
TLDR
It is demonstrated that the focus of research has changed in recent years from answer extraction to answer generation, from single to multi-document reading comprehension, and from learning from scratch to using pre-trained embeddings.
...
...

References

SHOWING 1-10 OF 52 REFERENCES
DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications
TLDR
This paper introduces DuReader, a new large-scale, open-domain Chinese machine reading comprehension (MRC) dataset, designed to address real-world MRC, and organizes a shared competition to encourage the exploration of more models.
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine
TLDR
It is shown that there is a meaningful gap between the human and machine performances, which suggests that the proposed dataset could well serve as a benchmark for question-answering.
SQuAD: 100,000+ Questions for Machine Comprehension of Text
TLDR
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).
A Proposal for Evaluating Answer Distillation from Web Data
TLDR
It is proposed that having a large number of reference answers available per query would be beneficial, and extensions to metrics like BLEU and METEOR for the scenario where this is true are suggested.
MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
TLDR
MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.
WikiQA: A Challenge Dataset for Open-Domain Question Answering
TLDR
The WIKIQA dataset is described, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering, which is more than an order of magnitude larger than the previous dataset.
NewsQA: A Machine Comprehension Dataset
TLDR
NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs, is presented and analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment.
Text Understanding with the Attention Sum Reader Network
TLDR
A new, simple model is presented that uses attention to directly pick the answer from the context as opposed to computing the answer using a blended representation of words in the document as is usual in similar models, making the model particularly suitable for question-answering problems where the answer is a single word from the document.
RACE: Large-scale ReAding Comprehension Dataset From Examinations
TLDR
The proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models and the ceiling human performance.
The NarrativeQA Reading Comprehension Challenge
TLDR
A new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts are presented, designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience.
...
...