Corpus ID: 237497374

Toward Deconfounding the Influence of Entity Demographics for Question Answering Accuracy

  title={Toward Deconfounding the Influence of Entity Demographics for Question Answering Accuracy},
  author={Maharshi Gor and Kellie Webster and Jordan L. Boyd-Graber},
The goal of question answering (QA) is to answer any question. However, major QA datasets have skewed distributions over gender, profession, and nationality. Despite that skew, model accuracy analysis reveals little evidence that accuracy is lower for people based on gender or nationality; instead, there is more variation on professions (question topic). But QA’s lack of representation could itself hide evidence of bias, necessitating QA datasets that better represent global diversity. 

Figures and Tables from this paper


What's in a Name? Answer Equivalence For Open-Domain Question Answering
This work explores mining alias entities from knowledge bases and using them as additional gold answers (i.e., equivalent answers) to incorporate answers for two settings: evaluation with additional answers and model training with equivalent answers. Expand
What Question Answering can Learn from Trivia Nerds
It is argued that creating a QA dataset—and the ubiquitous leaderboard that goes with it—closely resembles running a trivia tournament and the hard-learned lessons from decades of the trivia community creating vibrant, fair, and effective question answering competitions are ignored. Expand
SQuAD: 100,000+ Questions for Machine Comprehension of Text
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%). Expand
Latent Retrieval for Weakly Supervised Open Domain Question Answering
It is shown for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system, and outperforming BM25 by up to 19 points in exact match. Expand
Natural Questions: A Benchmark for Question Answering Research
The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature. Expand
UNQOVERing Stereotypical Biases via Underspecified Questions
UNQOVER, a general framework to probe and quantify biases through underspecified questions, is presented, showing that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence. Expand
Predicting a Scientific Community’s Response to an Article
It is demonstrated that text features significantly improve accuracy of predictions over metadata features like authors, topical categories, and publication venues. Expand
What Makes Reading Comprehension Questions Easier?
This study proposes to employ simple heuristics to split each dataset into easy and hard subsets and examines the performance of two baseline models for each of the subsets, and observes that the baseline performances for thehard subsets remarkably degrade compared to those of entire datasets. Expand
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers. Expand
MLQA: Evaluating Cross-lingual Extractive Question Answering
This work presents MLQA, a multi-way aligned extractive QA evaluation benchmark intended to spur research in this area, and evaluates state-of-the-art cross-lingual models and machine-translation-based baselines onMLQA. Expand