Measuring Model Biases in the Absence of Ground Truth

  title={Measuring Model Biases in the Absence of Ground Truth},
  author={Osman Aka and Ken Burke and Alex Bauerle and Christina Greer and Margaret Mitchell},
  journal={Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society},
The measurement of bias in machine learning often focuses on model performance across identity subgroups (such as man and woman) with respect to groundtruth labels. However, these methods do not directly measure the associations that a model may have learned, for example between labels and identity subgroups. Further, measuring a model's bias requires a fully annotated evaluation dataset which may not be easily available in practice. We present an elegant mathematical solution that tackles both鈥β

Measuring Data

The task of measuring data is identified to quantitatively characterize the composition of machine learning data and datasets to motivate measuring data as a critical component of responsible AI development.

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

This work presents Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding, and finds that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.

The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings

This work studies the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quanti铿乧ation methods and proves that the frequency-based effect observed in unshuf铿俥d corpora stems from properties of the metric rather than from word associations.

Handling Bias in Toxic Speech Detection: A Survey

Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors such as the context, geography, socio-political climate, and background of the producers and consumers

Social Norm Bias: Residual Harms of Fairness-Aware Algorithms

This work characterize Social Norm Bias (SNoB), a subtle but consequen-tial type of algorithmic discrimination that may be exhibited by machine learning models, even when these systems achieve group fairness objectives, by measuring how an algorithm鈥檚 predictions are associated with conformity to inferred gender norms.

Fake it till you make it: Learning(s) from a synthetic ImageNet clone

It is shown that with minimal and class-agnostic prompt engineering those ImageNet clones the authors denote as ImageNet-SD are able to close a large part of the gap between models produced by synthetic images and models trained with real images for the several standard classi铿乧ation benchmarks that are considered in this study.

Re-contextualizing Fairness in NLP: The Case of India

Recent research has revealed undesirable biases in NLP data and models. However, these efforts focus of social disparities in West, and are not directly portable to other geo-cultural contexts. In

BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling

This work experiment with di铿erent sampling methods from the Spanish version of mC4, and presents a novel data-centric technique which is named perplexity sampling that enables the pre-training of language models in roughly half the amount of steps and using one third of the data.

The Equity Framework: Fairness Beyond Equalized Predictive Outcomes

Machine Learning (ML) decision-making algorithms are now widely used in predictive decision-making, for example, to determine who to admit and give a loan. Their wide usage and consequential effects

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets

It is argued that automatic signals can highlight important aspects of the overall distribution of data, which is valuable for databases that can not be disclosed.




1. In psychological work the problem of comparing two different rankings of the same set of individuals may be divided into two types. In the first type the individuals have a given order A which is

Equality of Opportunity in Supervised Learning

This work proposes a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features and shows how to optimally adjust any learned predictor so as to remove discrimination according to this definition.

Word Association Norms, Mutual Information and Lexicography

The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.

The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale

  • 2018

Handbook of semantic word norms

Distributional Structure

This discussion will discuss how each language can be described in terms of a distributional structure, i.e. in Terms of the occurrence of parts relative to other parts, and how this description is complete without intrusion of other features such as history or meaning.

Predictive Inequity in Object Detection

This work annotates an existing large scale dataset which contains pedestrians with Fitzpatrick skin tones in ranges [1-3] or [4-6], and provides an in-depth comparative analysis of performance between these two skin tone groupings, finding that neither time of day nor occlusion explain this behavior.

ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases

It is experimentally demonstrated that the accuracy and robustness of ConvNets measured on Imagenet are vastly underestimated and that explanations can mitigate the impact of misclassified adversarial examples from the perspective of the end-user.

Women also Snowboard: Overcoming Bias in Captioning Models

A new Equalizer model is introduced that ensures equal gender probability when gender Evidence is occluded in a scene and confident predictions when gender evidence is present and has lower error than prior work when describing images with people and mentioning their gender and more closely matches the ground truth ratio of sentences including women to sentences including men.