• Corpus ID: 208267284

How Do We Talk about Other People? Group (Un)Fairness in Natural Language Image Descriptions

@inproceedings{Otterbacher2019HowDW,
  title={How Do We Talk about Other People? Group (Un)Fairness in Natural Language Image Descriptions},
  author={Jahna Otterbacher and Pınar Barlas and Styliani Kleanthous and Kyriakos Kyriakou},
  booktitle={AAAI 2019},
  year={2019}
}
Crowdsourcing plays a key role in developing algorithms for image recognition or captioning. Major datasets, such as MS COCO or Flickr30K, have been built by eliciting natural language descriptions of images from workers. Yet such elicitation tasks are susceptible to human biases, including stereotyping people depicted in images. Given the growing concerns surrounding discrimination in algorithms, as well as in the data used to train them, it is necessary to take a critical look at this… 
Understanding and Evaluating Racial Biases in Image Captioning
TLDR
Differences in caption performance, sentiment, and word choice between images of lighter versus darker-skinned people are found to be greater in modern captioning systems compared to older ones, thus leading to concerns that without proper consideration and mitigation these differences will only become increasingly prevalent.
To "See" is to Stereotype
TLDR
A controlled experiment is designed, to examine the interdependence between algorithmic recognition of context and the depicted person's gender, and to create a highly controlled dataset of people images, imposed on gender-stereotyped backgrounds.
Person, Human, Neither: The Dehumanization Potential of Automated Image Tagging
TLDR
This work audits six proprietary image tagging algorithms for their potential to perpetuate dehumanization and highlights the subtle ways in which ITAs may inflict widespread, disparate harm, and emphasizes the importance of considering the social context of the resulting application.
It’s About Time: A View of Crowdsourced Data Before and During the Pandemic
TLDR
Analysis of themes of Identity and Health conveyed in workers’ tags finds evidence that supports the potential for temporal sensitivity in crowdsourced data and relates the findings to the emerging research on crowdworkers’ moods.
Computer Vision and Conflicting Values: Describing People with Automated Alt Text
TLDR
This paper analyzes the policies that Facebook has adopted with respect to identity categories, such as race, gender, age, etc., and the company's decisions about whether to present these terms in alt text, and describes an alternative---and manual---approach practiced in the museum community, focusing on how museums determine what to include in altText descriptions of cultural artifacts.
How Do Image Description Systems Describe People? A Targeted Assessment of System Competence in the PEOPLE-domain
TLDR
This paper proposes a different kind of assessment, able to quantify the extent to which image description systems are able to describe humans, based on a manual characterization of English entity labels in the PEOPLE domain to determine the range of possible outputs.
Accounting for Confirmation Bias in Crowdsourced Label Aggregation
TLDR
This paper presents an algorithmic approach to infer the correct answers of tasks by aggregating the annotations from multiple crowd workers, while taking workers’ various levels of confirmation bias into consideration, and shows that the proposed bias-aware label aggregation algorithm outperforms baseline methods in accurately inferring the ground-truth labels of different tasks.
Multimodal datasets: misogyny, pornography, and malignant stereotypes
TLDR
The recently released LAION-400M dataset is examined, which is a CLIP-filtered dataset of Image-Alt-text pairs parsed from the Common-Crawl dataset, and it is found that the dataset contains, troublesome and explicit images and text pairs of rape, pornography, malign stereotypes, racist and ethnic slurs, and other extremely problematic content.
"That's in the eye of the beholder": Layers of Interpretation in Image Descriptions for Fictional Representations of People with Disabilities
TLDR
Five key themes are discovered along with an analysis of the layers of interpretation at work in the production and consumption of image descriptions for fictional representations of people with disabilities.
Designing and Optimizing Cognitive Debiasing Strategies for Crowdsourcing Annotation
TLDR
This position paper advocate for leveraging cognitive debiasing strategies developed in the psychological literature to mitigate biases in crowdsourced annotations that AI is trained on.
...
1
2
...

References

SHOWING 1-10 OF 31 REFERENCES
Social B(eye)as: Human and Machine Descriptions of People Images
TLDR
A method of auditing image tagging algorithms for social bias and a dataset of human and machine-produced tags, the typology, and the vectorization method can be used to explore a range of research questions related to both algorithmic and human behaviors.
Fairness in Proprietary Image Tagging Algorithms: A Cross-Platform Audit on People Images
TLDR
It is shown that behaviors differ significantly between image tagging APIs, and while some offer more interpretation on images, they may exhibit less fairness toward the depicted persons, by misuse of gender-related tags and/or making judgments on a person’s physical appearance.
Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels
TLDR
This paper proposes an algorithm to decouple the human reporting bias from the correct visually grounded labels, and shows significant improvements over traditional algorithms for both image classification and image captioning, doubling the performance of existing methods in some cases.
Crowdsourcing Stereotypes: Linguistic Bias in Metadata Generated via GWAP
TLDR
This work exposes the presence of gender-based stereotypes through linguistic biases, illustrates the forms in which they manifest, and raises important implications for those who design systems or train algorithms using data produced via GWAP.
Social Cues, Social Biases: Stereotypes in Annotations on People Images
TLDR
This work considers the case of linguistic biases and their consequences for the words that crowdworkers use to describe people images in an annotation task and shows evidence of these biases, which are exacerbated when an image’s "popular tags" are displayed.
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
TLDR
This work proposes to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference to reduce the magnitude of bias amplification in multilabel object classification and visual semantic role labeling.
Women also Snowboard: Overcoming Bias in Captioning Models
TLDR
A new Equalizer model is introduced that ensures equal gender probability when gender Evidence is occluded in a scene and confident predictions when gender evidence is present and has lower error than prior work when describing images with people and mentioning their gender and more closely matches the ground truth ratio of sentences including women to sentences including men.
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
TLDR
This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks.
Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics (Extended Abstract)
TLDR
This work proposes to frame sentence-based image annotation as the task of ranking a given pool of captions, and introduces a new benchmark collection, consisting of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events.
Understanding and predicting importance in images
TLDR
This paper explores how a number of factors relate to human perception of importance using what people describe as a proxy for importance, and builds models to predict what will be described about an image given either known image content, or image content estimated automatically by recognition systems.
...
1
2
3
4
...