Measuring Model Biases in the Absence of Ground Truth

  title={Measuring Model Biases in the Absence of Ground Truth},
  author={Osman Aka and Ken Burke and Alex Bauerle and Christina Greer and Margaret Mitchell},
  journal={Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society},
The measurement of bias in machine learning often focuses on model performance across identity subgroups (such as man and woman) with respect to groundtruth labels. However, these methods do not directly measure the associations that a model may have learned, for example between labels and identity subgroups. Further, measuring a model's bias requires a fully annotated evaluation dataset which may not be easily available in practice. We present an elegant mathematical solution that tackles both鈥β

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

This work presents Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding, and finds that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.

Re-contextualizing Fairness in NLP: The Case of India

Recent research has revealed undesirable biases in NLP data and models. However, these efforts focus of social disparities in West, and are not directly portable to other geo-cultural contexts. In

BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling

This work experiment with di铿erent sampling methods from the Spanish version of mC4, and presents a novel data-centric technique which is named perplexity sampling that enables the pre-training of language models in roughly half the amount of steps and using one third of the data.

The Equity Framework: Fairness Beyond Equalized Predictive Outcomes

Machine Learning (ML) decision-making algorithms are now widely used in predictive decision-making, for example, to determine who to admit and give a loan. Their wide usage and consequential effects

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets

It is argued that automatic signals can highlight important aspects of the overall distribution of data, which is valuable for databases that can not be disclosed.

Algorithmic fairness datasets: the story so far

This work surveys over two hundred datasets employed in algorithmic fairness research, and produces standardized and searchable documentation for each of them, rigorously identifying the three most popular fairness datasets, namely Adult, COMPAS, and German Credit, for which this unifying documentation effort supports multiple contributions.

Visual Identification of Problematic Bias in Large Label Spaces

Different models and datasets for large label spaces can be systematically and visually analyzed and compared to make informed fairness assessments tackling problematic bias, and the approach can be integrated into classical model and data pipelines.

Scaling Vision Transformers

A ViT model with two billion parameters is successfully trained, which attains a new state-of-the-art on ImageNet of 90.45% top-1 accuracy and performs well for few-shot transfer.

Social Norm Bias: Residual Harms of Fairness-Aware Algorithms

This work characterize Social Norm Bias (SNoB), a subtle but consequen-tial type of algorithmic discrimination that may be exhibited by machine learning models, even when these systems achieve group fairness objectives, by measuring how an algorithm鈥檚 predictions are associated with conformity to inferred gender norms.




1. In psychological work the problem of comparing two different rankings of the same set of individuals may be divided into two types. In the first type the individuals have a given order A which is

Equality of Opportunity in Supervised Learning

This work proposes a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features and shows how to optimally adjust any learned predictor so as to remove discrimination according to this definition.

Word Association Norms, Mutual Information and Lexicography

The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.

The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale

  • 2018

Handbook of semantic word norms

Distributional Structure

This discussion will discuss how each language can be described in terms of a distributional structure, i.e. in Terms of the occurrence of parts relative to other parts, and how this description is complete without intrusion of other features such as history or meaning.

Predictive Inequity in Object Detection

This work annotates an existing large scale dataset which contains pedestrians with Fitzpatrick skin tones in ranges [1-3] or [4-6], and provides an in-depth comparative analysis of performance between these two skin tone groupings, finding that neither time of day nor occlusion explain this behavior.

ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases

It is experimentally demonstrated that the accuracy and robustness of ConvNets measured on Imagenet are vastly underestimated and that explanations can mitigate the impact of misclassified adversarial examples from the perspective of the end-user.

Women also Snowboard: Overcoming Bias in Captioning Models

A new Equalizer model is introduced that ensures equal gender probability when gender Evidence is occluded in a scene and confident predictions when gender evidence is present and has lower error than prior work when describing images with people and mentioning their gender and more closely matches the ground truth ratio of sentences including women to sentences including men.