Gender and Racial Bias in Visual Question Answering Datasets

  title={Gender and Racial Bias in Visual Question Answering Datasets},
  author={Yusuke Hirota and Yuta Nakashima and Noa Garc{\'i}a},
  journal={2022 ACM Conference on Fairness, Accountability, and Transparency},
Vision-and-language tasks have increasingly drawn more attention as a means to evaluate human-like reasoning in machine learning models. A popular task in the field is visual question answering (VQA), which aims to answer questions about images. However, VQA models have been shown to exploit language bias by learning the statistical correlations between questions and answers without looking into the image content: e.g., questions about the color of a banana are answered with yellow, even if the… 


Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
This paper addresses the task of knowledge-based visual question answering and provides a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources.
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. We have developed a strong and
Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification
It is shown that the highest error involves images of dark-skinned women, while the most accurate result is for light-skinned men, in commercial API-based classifiers of gender from facial images, including IBM Watson Visual Recognition.
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
This work balances the popular VQA dataset by collecting complementary images such that every question in the authors' balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the question.
VQA: Visual Question Answering
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language
Understanding and Evaluating Racial Biases in Image Captioning
Differences in caption performance, sentiment, and word choice between images of lighter versus darker-skinned people are found to be greater in modern captioning systems compared to older ones, thus leading to concerns that without proper consideration and mitigation these differences will only become increasingly prevalent.
Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations
It is shown that trained models significantly amplify the association of target labels with gender beyond what one would expect from biased datasets, and an adversarial approach is adopted to remove unwanted features corresponding to protected variables from intermediate representations in a deep neural network.
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
This work proposes to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference to reduce the magnitude of bias amplification in multilabel object classification and visual semantic role labeling.
YFCC100M: the new data in multimedia research
This publicly available curated dataset of almost 100 million photos and videos is free and legal for all.