Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus

  title={Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus},
  author={Iulian Serban and Alberto Garc{\'i}a-Dur{\'a}n and Çaglar G{\"u}lçehre and Sungjin Ahn and A. P. Sarath Chandar and Aaron C. Courville and Yoshua Bengio},
Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances. [] Key Result Furthermore, when presented to human evaluators, the generated questions appear comparable in quality to real human-generated questions.

Figures and Tables from this paper

Large-Scale Simple Question Generation by Template-Based Seq2seq Learning

A 28M Chinese Q&A corpora based on the Chinese knowledge base provided by NLPCC2017 KBQA challenge is presented and a novel neural network architecture which combines template-based method and seq2seq learning to generate highly fluent and diverse questions is proposed.

Question Generation for Question Answering

Experimental results show that, by using generated questions as an extra signal, significant QA improvement can be achieved.

Chinese Neural Question Generation: Augmenting Knowledge into Multiple Neural Encoders

This study investigated how to incorporate knowledge triples into the sequence-to-sequence neural model to reduce such contextual information loss and proposed a multi-encoder neural model for Chinese question generation.

Neural Question Generation from Text: A Preliminary Study

A preliminary study on neural question generation from text with the SQuAD dataset is conducted, and the experiment results show that the method can produce fluent and diverse questions.

CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations

The aim of this study is the development of such resources for other languages by proposing to generate in a semi-automatic way questions from the semantic Frame analysis of large corpora by applying the CALOR-Frame French corpus method.

Question Generation using Deep Neural Networks

An attention based, recurrent neural model is developed that generates questions based on sentences and the Transformer, a novel neural network architecture whose implementation in Question Generation has not been reported to date is explored.

Harvesting Paragraph-level Question-Answer Pairs from Wikipedia

It is found that the linguistic knowledge introduced by the coreference representation aids question generation significantly, producing models that outperform the current state-of-the-art.

Automating Reading Comprehension by Generating Question and Answer Pairs

A novel two-stage process to generate question-answer pairs from the text using sequence to sequence models and global attention and answer encoding for generating the question most relevant to the answer is presented.

Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering

This work casts neural QA as a sequence labeling problem and proposes an end-to-end sequence labeling model, which overcomes all the above challenges and outperforms the baselines significantly on WebQA.

Machine Comprehension by Text-to-Text Neural Question Generation

A recurrent neural model is proposed that generates natural-language questions from documents, conditioned on answers, and fine-tune the model using policy gradient techniques to maximize several rewards that measure question quality.



Open Question Answering with Weakly Supervised Embedding Models

This paper empirically demonstrate that the model can capture meaningful signals from its noisy supervision leading to major improvements over paralex, the only existing method able to be trained on similar weakly labeled data.

Automation of Question Generation From Sentences

A system that automates generation of questions from a sentence that will generate all possible questions which this sentence contain these questions answers is considered.

Large-scale Simple Question Answering with Memory Networks

This paper studies the impact of multitask and transfer learning for simple question answering; a setting for which the reasoning required to answer is quite easy, as long as one can retrieve the correct evidence given a question, which can be difficult in large-scale conditions.

Question Generation based on Lexico-Syntactic Patterns Learned from the Web

The question generation task as performed by T- Mand is detail and several techniques are applied in order to discard low quality items.

Question Generation with Minimal Recursion Semantics

The performance of proposed method is compared against other syntax and rule based systems, and the result reveals the challenges of current research on question generation and indicates direction for future work.

Semantics-based Question Generation and Implementation

This paper presents a question generation system based on the approach of semantic rewriting, and shows a principled way of generating questions without ad-hoc manipulation of the syntactic structures.

Web question answering: is more always better?

This paper describes a question answering system that is designed to capitalize on the tremendous amount of data that is now available online, and uses the redundancy available in large corpora as an important resource to simplify the query rewrites and support answer mining from returned snippets.

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Generating Natural Language from Linked Data: Unsupervised template extraction

An architecture for generating natural language from Linked Data that automatically learns sentence templates and statistical document planning from parallel RDF datasets and text and significantly outperforms the baseline on two of three measures: non-redundancy and structure and coherence.

Addressing the Rare Word Problem in Neural Machine Translation

This paper proposes and implements an effective technique to address the problem of end-to-end neural machine translation's inability to correctly translate very rare words, and is the first to surpass the best result achieved on a WMT’14 contest task.