A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers

@inproceedings{Dasigi2021ADO,
  title={A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers},
  author={Pradeep Dasigi and Kyle Lo and Iz Beltagy and Arman Cohan and Noah A. Smith and Matt Gardner},
  booktitle={NAACL},
  year={2021}
}
Readers of academic research papers often read with the goal of answering specific questions. Question Answering systems that can answer those questions can make consumption of the content much more efficient. However, building such tools requires data that reflect the difficulty of the task arising from complex reasoning about claims made in multiple parts of a paper. In contrast, existing information-seeking question answering datasets usually contain questions about generic factoid-type… 

Figures and Tables from this paper

ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers
TLDR
It is shown that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions, and will motivate further research in answering complex questions over long documents.
A Survey on Multi-hop Question Answering and Generation
TLDR
A general and formal definition of MHQA task is provided, the existing attempts to this highly interesting, yet quite challenging task are summarized, and the best methods to createMHQA datasets are outlined.
End-to-End Multihop Retrieval for Compositional Question Answering over Long Documents
TLDR
This paper proposes a multihop retrieval method, DOCHOPPER, to answer compositional questions over long documents and demonstrates that utilizing document structure in this was can largely improve question-answering and retrieval performance on long documents.
Modern Question Answering Datasets and Benchmarks: A Survey
TLDR
Two of the most common QA tasks - textual question answer and visual question answering - are introduced separately, covering the most representative datasets, and some current challenges of QA research are given.
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
TLDR
The largest survey of the field to date of question answering and reading comprehension, providing an overview of the various formats and domains of the current resources, and highlighting the current lacunae for future work.
Iterative Hierarchical Attention for Answering Complex Questions over Long Documents
TLDR
A new model that iteratively attends to different parts of long, heirarchically structured documents to answer complex questions, DOCHOPPER, which achieves state-of-the-art results on three of the datasets and is efficient at inference time, being 3–10 times faster than the baselines.
Long Context Question Answering via Supervised Contrastive Learning
TLDR
This work proposes a novel method for equipping long-context QA models with an additional sequence-level objective for better identification of the supporting evidence, via an additional contrastive supervision signal in finetuning.
Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering
TLDR
This work proposes a novel method for equipping long-range transformers with an additional sequence-level objective for better identification of supporting evidence spans by proposing an additional contrastive supervision signal in finetuning.
ArgSciChat: A Dataset for Argumentative Dialogues on Scientific Papers
TLDR
A novel framework to collect dialogues between scientists as domain experts on scientific papers and lets scientists present their scientific papers as groundings for dialogues and participate in dialogue they like its paper title is introduced.
Flipping the Script: Inverse Information Seeking Dialogues for Market Research
TLDR
This work introduces and provides a formal definition of an inverse information seeking agent, outlines some of its unique challenges, and proposes a novel framework to tackle this problem based on techniques from natural language processing (NLP) and IIR.
...
...

References

SHOWING 1-10 OF 39 REFERENCES
IIRC: A Dataset of Incomplete Information Reading Comprehension Questions
TLDR
A dataset with more than 13K questions over paragraphs from English Wikipedia that provide only partial information to answer them, with the missing information occurring in one or more linked documents, finding that it achieves 31.1% F1 on this task, while estimated human performance is 88.4%.
Natural Questions: A Benchmark for Question Answering Research
TLDR
The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.
RikiNet: Reading Wikipedia Pages for Natural Question Answering
TLDR
This paper introduces a new model, called RikiNet, which reads Wikipedia pages for natural question answering, which is the first single model that outperforms the single human performance.
WikiQA: A Challenge Dataset for Open-Domain Question Answering
TLDR
The WIKIQA dataset is described, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering, which is more than an order of magnitude larger than the previous dataset.
QuAC: Question Answering in Context
TLDR
QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as it shows in a detailed qualitative evaluation.
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
TLDR
This new dataset is aimed to overcome a number of well-known weaknesses of previous publicly available datasets for the same task of reading comprehension and question answering, and is the most comprehensive real-world dataset of its kind in both quantity and quality.
CoQA: A Conversational Question Answering Challenge
TLDR
CoQA is introduced, a novel dataset for building Conversational Question Answering systems and it is shown that conversational questions have challenging phenomena not present in existing reading comprehension datasets (e.g., coreference and pragmatic reasoning).
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
TLDR
A quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora are presented.
SQuAD: 100,000+ Questions for Machine Comprehension of Text
TLDR
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).
The NarrativeQA Reading Comprehension Challenge
TLDR
A new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts are presented, designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience.
...
...