Improving the Robustness of QA Models to Challenge Sets with Variational Question-Answer Pair Generation

  title={Improving the Robustness of QA Models to Challenge Sets with Variational Question-Answer Pair Generation},
  author={Kazutoshi Shinoda and Saku Sugawara and Akiko Aizawa},
Question answering (QA) models for reading comprehension have achieved human-level accuracy on in-distribution test sets. However, they have been demonstrated to lack robustness to challenge sets, whose distribution is different from that of training sets. Existing data augmentation methods mitigate this problem by simply augmenting training sets with synthetic examples sampled from the same distribution as the challenge sets. However, these methods assume that the distribution of a challenge… 

Figures and Tables from this paper

Can Question Generation Debias Question Answering Models? A Case Study on Question–Context Lexical Overlap
The proposed data augmentation approach is simple yet effective to mitigate the degradation problem with only 70k synthetic examples and uses a synonym replacement-based approach to augment questions with low lexical overlap.
An Understanding-Oriented Robust Machine Reading Comprehension Model
Although existing machine reading comprehension models are making rapid progress on many datasets, they are far from robust. In this paper, we propose an understanding-oriented machine reading
Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors
A novel latent structured variable model to generate high quality texts by en-riching contextual representation learning of encoder-decoder models is presented and an variational inference approach to approximate the posterior distribution of random context variables is proposed.
How to Build Robust FAQ Chatbot with Controllable Question Generator?
The diversity controllable semantically valid adversarial attacker (DCSA) is proposed, a high-quality, diverse, controLLable method to generate standard and adversarial samples with a semantic graph and it is found that the generated data set improves the generalizability of the QA model to the new target domain and the robustness of theQA models to detect unanswerable adversarial questions.


BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
There is substantial room for progress before QA systems can be effectively deployed, the need for QA evaluation to expand to consider real-world use is highlighted, and the findings will spur greater community interest in the issues that arise when the authors' systems actually need to be of utility to humans.
Generating Diverse and Consistent QA pairs from Contexts with Information-Maximizing Hierarchical Conditional VAEs
The Information Maximizing Hierarchical Conditional Variational AutoEncoder (Info-HCVAE) is validated on several benchmark datasets by evaluating the performance of the QA model using only the generated QA pairs (QA-based evaluation) or using both the generated and human-labeled pairs for training, against state-of-the-art baseline models.
Asking Questions the Human Way: Scalable Question-Answer Generation from Text Corpus
Answer-Clue-Style-aware Question Generation (ACS-QG), which aims at automatically generating high-quality and diverse question-answer pairs from unlabeled text corpus at scale by imitating the way a human asks questions, dramatically outperforms state-of-the-art neural question generation models in terms of the generation quality.
Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering
This paper proposes two semantics-enhanced rewards obtained from downstream question paraphrasing and question answering tasks to regularize the QG model to generate semantically valid questions, and proposes a QA-based evaluation method which measures the model’s ability to mimic human annotators in generating QA training data.
Are Red Roses Red? Evaluating Consistency of Question-Answering Models
A method to automatically extract implications for instances from two QA datasets, VQA and SQuAD, which is used to evaluate the consistency of models and shows these generated implications are well formed and valid.
Improving the Robustness of Question Answering Systems to Question Paraphrasing
This work proposes a data augmentation approach that requires no human intervention to re-train the models for improved robustness to question paraphrasing and uses a neural paraphrase model trained to generate multiple paraphrased questions for a given source question and a set of paraphrase suggestions.
Synthetic QA Corpora Generation with Roundtrip Consistency
A novel method of generating synthetic question answering corpora is introduced by combining models of question generation and answer extraction, and by filtering the results to ensure roundtrip consistency, establishing a new state-of-the-art on SQuAD2 and NQ.
Neural Machine Translation by Jointly Learning to Align and Translate
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Pointer Networks
A new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence using a recently proposed mechanism of neural attention, called Ptr-Nets, which improves over sequence-to-sequence with input attention, but also allows it to generalize to variable size output dictionaries.