Crowdsourcing Multiple Choice Science Questions

@article{Welbl2017CrowdsourcingMC,
  title={Crowdsourcing Multiple Choice Science Questions},
  author={Johannes Welbl and Nelson F. Liu and Matt Gardner},
  journal={ArXiv},
  year={2017},
  volume={abs/1707.06209}
}
We present a novel method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers. Generating these questions can be difficult without trading away originality, relevance or diversity in the answer options. Our method addresses these problems by leveraging a large corpus of domain-specific text and a small set of existing questions. It produces model suggestions for document selection and answer distractor choice which aid the human question generation process… 

Figures and Tables from this paper

Quiz-Style Question Generation for News Stories

This work proposes a series of novel techniques for applying large pre-trained Transformer encoder-decoder models, namely PEGASUS and T5, to the tasks of question-answer generation and distractor generation, and shows that these models outperform strong baselines using both automated metrics and human raters.

Unsupervised multiple-choice question generation for out-of-domain Q&A fine-tuning

It is shown that the state-of-the-art model UnifiedQA can greatly benefit from a rule-based algorithm that generates questions and answers from unannotated sentences on a multiple-choice benchmark about physics, biology and chemistry it has never been trained on.

Improving Question Answering with External Knowledge

This work explores simple yet effective methods for exploiting two sources of externalknowledge for exploiting unstructured external knowledge for subject-area QA on multiple-choice question answering tasks in subject areas such as science.

Knowledge-Driven Distractor Generation for Cloze-style Multiple Choice Questions

A novel configurable framework to automatically generate distractive choices for open-domain cloze-style multiple-choice questions and shows that distractors outperforming previous methods both by automatic and human evaluation.

Answering Science Exam Questions Using Query Reformulation with Background Knowledge

This paper presents a system that reformulates a given question into queries that are used to retrieve supporting text from a large corpus of science-related text and outperforms several strong baselines on the ARC dataset.

Investigating Crowdsourcing to Generate Distractors for Multiple-Choice Assessments

The results suggest that crowdsourcing can be a very useful tool in generating effective distractors (attractive to subjects who do not understand the targeted concept) and suggest that this method is faster, easier, and cheaper than is the traditional method of having one or more experts draft distractors.

Ranking Multiple Choice Question Distractors using Semantically Informed Neural Networks

Experimental results demonstrate the proposed CNN-BiLSTM model surpasses the performance of the existing baseline models and intelligently incorporating word level semantic information along with context specific word embeddings boost up the predictive performance of distractors, which is a promising direction for further research.

Ranking Distractors for Multiple Choice Questions

Experimental results demonstrate the proposed CNN-BiLSTM model surpasses the performance of the existing baseline models and intelligently incorporating word level semantic information along with context specific word embeddings boost up the predictive performance of distractors, which is a promising direction for further research.

Answering Science Exam Questions Using Query Rewriting with Background Knowledge

A system that rewrites a given question into queries that are used to retrieve supporting text from a large corpus of science-related text is presented and is able to outperform several strong baselines on the ARC dataset.

Generating Answer Candidates for Quizzes and Answer-Aware Question Generators

This work proposes a model that can generate a specified number of answer candidates for a given passage of text, which can then be used by instructors to write questions manually or can be passed as an input to automatic answer-aware question generators.
...

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

Good Question! Statistical Ranking for Question Generation

This work uses manually written rules to perform a sequence of general purpose syntactic transformations to turn declarative sentences into questions, which are ranked by a logistic regression model trained on a small, tailored dataset consisting of labeled output from the system.

Who did What: A Large-Scale Person-Centered Cloze Dataset

A new "Who-did-What" dataset of over 200,000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus is constructed and proposed as a challenge task for the community.

Large-scale Simple Question Answering with Memory Networks

This paper studies the impact of multitask and transfer learning for simple question answering; a setting for which the reasoning required to answer is quite easy, as long as one can retrieve the correct evidence given a question, which can be difficult in large-scale conditions.

Text Understanding with the Attention Sum Reader Network

A new, simple model is presented that uses attention to directly pick the answer from the context as opposed to computing the answer using a blended representation of words in the document as is usual in similar models, making the model particularly suitable for question-answering problems where the answer is a single word from the document.

WikiQA: A Challenge Dataset for Open-Domain Question Answering

The WIKIQA dataset is described, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering, which is more than an order of magnitude larger than the previous dataset.

Automatic Generation of Challenging Distractors Using Context-Sensitive Inference Rules

This work proposes to employ context-sensitive lexical inference rules in order to generate distractors that are semantically similar to the gap target word in some sense, but not in the particular sense induced by the gap-fill context.

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

This new dataset is aimed to overcome a number of well-known weaknesses of previous publicly available datasets for the same task of reading comprehension and question answering, and is the most comprehensive real-world dataset of its kind in both quantity and quality.

Answering Elementary Science Questions by Constructing Coherent Scenes using Background Knowledge

This work shows that by using a simple “knowledge graph” representation of the question, it can leverage several large-scale linguistic resources to provide missing background knowledge, somewhat alleviating the knowledge bottleneck in previous approaches.

A Selection Strategy to Improve Cloze Question Quality

We present a strategy to improve the quality of automatically generated cloze and open cloze questions which are used by the REAP tutoring system for assessment in the ill-defined domain of English