Break It Down: A Question Understanding Benchmark

  title={Break It Down: A Question Understanding Benchmark},
  author={Tomer Wolfson and Mor Geva and Ankit Gupta and Matt Gardner and Yoav Goldberg and Daniel Deutch and Jonathan Berant},
  journal={Transactions of the Association for Computational Linguistics},
Understanding natural language questions entails the ability to break down a question into the requisite steps for computing its answer. In this work, we introduce a Question Decomposition Meaning Representation (QDMR) for questions. QDMR constitutes the ordered list of steps, expressed through natural language, that are necessary for answering a question. We develop a crowdsourcing pipeline, showing that quality QDMRs can be annotated at scale, and release the Break dataset, containing over… 

Robust Question Answering Through Sub-part Alignment

This work model question answering as an alignment problem, decomposing both the question and context into smaller units based on off-the-shelf semantic representations, and align the question to a subgraph of the context in order to find the answer.

Unsupervised Question Decomposition for Question Answering

An algorithm for One-to-N Unsupervised Sequence transduction (ONUS) that learns to map one hard, multi-hop question to many simpler, single-hop sub-questions, which is promising for shedding light on why a QA system makes a prediction.

Detecting Frozen Phrases in Open-Domain Question Answering

The experiments reveal that detecting frozen phrases whose presence in answer documents are highly plausible yields significant improvements in retrievals as well as in the end-to-end accuracy of open-domain QA models.

SPARQLing Database Queries from Intermediate Question Decompositions

This work observes that the execution accuracy of queries constructed by the model on the challenging Spider dataset is comparable with the state-of-the-art text-to-SQL methods trained with annotated SQL queries.

Weakly Supervised Text-to-SQL Parsing through Question Decomposition

This work proposes a weak supervision approach for training text-to-SQL parsers that takes advantage of the recently pro-posed question meaning representation called QDMR, an intermediate between NL and formal query languages.

Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition

This work introduces the “Break, Perturb, Build” (BPB) framework for automatic reasoning-oriented perturbation of question-answer pairs, and demonstrates the effectiveness of BPB by creating evaluation sets for three reading comprehension benchmarks, generating thousands of high-quality examples without human intervention.

QED: A Framework and Dataset for Explanations in Question Answering

A large user study is described showing that the presence of QED explanations significantly improves the ability of untrained raters to spot errors made by a strong neural QA baseline.

Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models

ModularQA is more versatile than existing explainable systems for DROP and HotpotQA datasets, is more robust than state-of-the-art blackbox (uninterpretable) systems, and generates more understandable and trustworthy explanations compared to prior work.

KILT: a Benchmark for Knowledge Intensive Language Tasks

It is found that a shared dense vector index coupled with a seq2seq model is a strong baseline, outperforming more tailor-made approaches for fact checking, open-domain question answering and dialogue, and yielding competitive results on entity linking and slot filling, by generating disambiguated text.

Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

This work introduces StrategyQA, a question answering benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy, and proposes a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts.



Learning to Reason: End-to-End Module Networks for Visual Question Answering

End-to-End Module Networks are proposed, which learn to reason by directly predicting instance-specific network layouts without the aid of a parser, and achieve an error reduction of nearly 50% relative to state-of-theart attentional approaches.

Natural Questions: A Benchmark for Question Answering Research

The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

Crowdsourcing Question-Answer Meaning Representations

A crowdsourcing scheme is developed to show that QAMRs can be labeled with very little training, and a qualitative analysis demonstrates that the crowd-generated question-answer pairs cover the vast majority of predicate-argument relationships in existing datasets.

Answering Complex Open-domain Questions Through Iterative Query Generation

This work presents GoldEn (Gold Entity) Retriever, which iterates between reading context and retrieving more supporting documents to answer open-domain multi-hop questions, and demonstrates that it outperforms the best previously published model despite not using pretrained language models such as BERT.

ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters

This work introduces ComQA, a large dataset of real user questions that exhibit different challenging aspects such as compositionality, temporal reasoning, and comparisons, and demonstrates that the dataset can be a driver of future research on QA.

The Web as a Knowledge-Base for Answering Complex Questions

This paper proposes to decompose complex questions into a sequence of simple questions, and compute the final answer from the sequence of answers, and empirically demonstrates that question decomposition improves performance from 20.8 precision@1 to 27.5 precision @1 on this new dataset.

Search-based Neural Structured Learning for Sequential Question Answering

This work proposes a novel dynamic neural semantic parsing framework trained using a weakly supervised reward-guided search that effectively leverages the sequential context to outperform state-of-the-art QA systems that are designed to answer highly complex questions.

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

It is shown that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. We have developed a strong and