Understanding and Improving Zero-shot Multi-hop Reasoning in Generative Question Answering

  title={Understanding and Improving Zero-shot Multi-hop Reasoning in Generative Question Answering},
  author={Zhengbao Jiang and J. Araki and Haibo Ding and Graham Neubig},
  booktitle={International Conference on Computational Linguistics},
Generative question answering (QA) models generate answers to questions either solely based on the parameters of the model (the closed-book setting) or additionally retrieving relevant evidence (the open-book setting). Generative QA models can answer some relatively complex questions, but the mechanism through which they do so is still poorly understood. We perform several studies aimed at better understanding the multi-hop reasoning capabilities of generative QA models. First, we decompose… 

Figures and Tables from this paper



Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?

A neural decomposition model is adopted to generate sub-questions for a multi-hop question, followed by extracting the corresponding sub-answers in order to shed some light on explaining the reasoning process of QA systems in answering complex questions.

Unsupervised Question Decomposition for Question Answering

An algorithm for One-to-N Unsupervised Sequence transduction (ONUS) that learns to map one hard, multi-hop question to many simpler, single-hop sub-questions, which is promising for shedding light on why a QA system makes a prediction.

Understanding Dataset Design Choices for Multi-hop Reasoning

This paper investigates two recently proposed datasets, WikiHop and HotpotQA, and explores sentence-factored models for these tasks; by design, these models cannot do multi-hop reasoning, but they are still able to solve a large number of examples in both datasets.

Multi-hop Reading Comprehension through Question Decomposition and Rescoring

A system that decomposes a compositional question into simpler sub-questions that can be answered by off-the-shelf single-hop RC models is proposed and a new global rescoring approach is introduced that considers each decomposition to select the best final answer, greatly improving overall performance.

Compositional Questions Do Not Necessitate Multi-hop Reasoning

This work introduces a single-hop BERT-based RC model that achieves 67 F1—comparable to state-of-the-art multi-hop models and designs an evaluation setting where humans are not shown all of the necessary paragraphs for the intendedmulti-hop reasoning but can still answer over 80% of questions.

Answering Complex Open-domain Questions Through Iterative Query Generation

This work presents GoldEn (Gold Entity) Retriever, which iterates between reading context and retrieving more supporting documents to answer open-domain multi-hop questions, and demonstrates that it outperforms the best previously published model despite not using pretrained language models such as BERT.

Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA

This paper shows that in the multi-hop HotpotQA dataset, the examples often contain reasoning shortcuts through which models can directly locate the answer by word-matching the question with a sentence in the context, and shows that the 2-hop model trained on the regular data is more robust to the adversaries than the baseline.

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

It is shown that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

This paper proposes an omnivorous pretraining approach that consumes both natural and synthetic data to endow models with these respective abilities, and performs extensive experiments to demonstrate the superiority of the model OmniTab.

Break It Down: A Question Understanding Benchmark

This work introduces a Question Decomposition Meaning Representation (QDMR) for questions, and demonstrates the utility of QDMR by showing that it can be used to improve open-domain question answering on the HotpotQA dataset, and can be deterministically converted to a pseudo-SQL formal language, which can alleviate annotation in semantic parsing applications.