The Values Encoded in Machine Learning Research

  title={The Values Encoded in Machine Learning Research},
  author={Abeba Birhane and Pratyusha Kalluri and Dallas Card and William Agnew and Ravit Dotan and Michelle Bao},
  journal={2022 ACM Conference on Fairness, Accountability, and Transparency},
Machine learning currently exerts an outsized influence on the world, increasingly affecting institutional practices and impacted communities. It is therefore critical that we question vague conceptions of the field as value-neutral or universally beneficial, and investigate what specific values the field is advancing. In this paper, we first introduce a method and annotation scheme for studying the values encoded in documents such as research papers. Applying the scheme, we analyze 100 highly… 

Figures and Tables from this paper

Evaluation Gaps in Machine Learning Practice

The evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations are examined, pointing the way towards more contextualized evaluation methodologies for robustly examining the trustworthiness of ML models.

Studying Up Machine Learning Data

This commentary proposes moving the research focus beyond bias-oriented framings by adopting a power-aware perspective to "study up" ML datasets, which means accounting for historical inequities, labor conditions, and epistemological standpoints inscribed in data.


It is claimed that in order to regulate AI tools and evaluate their reliability, agencies need an explanation of how ML tools have been built, which requires documenting and justifying the technical choices that practitioners have made in designing such tools.

Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?

This commentary proposes moving the research focus beyond biasoriented framings by adopting a power-aware perspective to “study up” ML datasets by accounting for historical inequities, labor conditions, and epistemological standpoints inscribed in data.

Square One Bias in NLP: Towards a Multi-Dimensional Exploration of the Research Manifold

Historical and recent examples of how the square one bias has led researchers to draw false conclusions or make unwise choices are provided, point to promising yet unexplored directions on the research manifold, and make practical recommendations to enable more multi-dimensional research.

Neural Language Models are Effective Plagiarists

It is found that a student using GPT-J can complete introductory level programming assignments without triggering suspicion from MOSS, a widely used plagiarism detection tool.

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

This work collects and introduces diverse non-homophilous datasets from a variety of application areas that have up to 384x more nodes and 1398x more edges than prior datasets and introduces LINKX — a strong simple method that admits straightforward minibatch training and inference.

Fooling MOSS Detection with Pretrained Language Models

It is found that a student using GPT-J can complete introductory level programming assignments without triggering suspicion from MOSS, a widely used software similarity and plagiarism detection tool.

The games we play: critical complexity improves machine learning

It is argued that best practice in ML should be more consistent with critical complexity perspectives than with rationalist, grand narratives, and is referred to as Open Machine Learning (Open ML), which is contrasted with some of the grand narratives of ML of two forms.

REAL ML: Recognizing, Exploring, and Articulating Limitations of Machine Learning Research

Transparency around limitations can improve the scientific rigor of research, help ensure appropriate interpretation of research findings, and make research claims more credible. Despite these



Value-laden disciplinary shifts in machine learning

A conceptual framework is developed to evaluate the process through which types of machine learning models become predominant and it is argued that the rise of a model-type is self-reinforcing-it influences the way model-types are evaluated.

Machine Learning that Matters

This work presents six Impact Challenges to explicitly focus the field of machine learning's energy and attention, and discusses existing obstacles that must be addressed.

A Framework for Understanding Unintended Consequences of Machine Learning

This paper provides a framework that partitions sources of downstream harm in machine learning into six distinct categories spanning the data generation and machine learning pipeline, and describes how these issues arise, how they are relevant to particular applications, and how they motivate different solutions.

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

Recommendations including weighing the environmental and financial costs first, investing resources into curating and carefully documenting datasets rather than ingesting everything on the web, and carrying out pre-development exercises evaluating how the planned approach fits into research and development goals and supports stakeholder values are provided.

NAS-Bench-101: Towards Reproducible Neural Architecture Search

This work introduces NAS-Bench-101, the first public architecture dataset for NAS research, which allows researchers to evaluate the quality of a diverse range of models in milliseconds by querying the pre-computed dataset.

Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?

This first systematic investigation of commercial product teams' challenges and needs for support in developing fairer ML systems identifies areas of alignment and disconnect between the challenges faced by teams in practice and the solutions proposed in the fair ML research literature.

Unpacking the Expressed Consequences of AI Research in Broader Impact Statements

A qualitative thematic analysis of a sample of statements written for the NeurIPS 2020 conference identifies themes related to how consequences are expressed, areas of impacts expressed, and researchers' recommendations for mitigating negative consequences in the future.

Translated Learning: Transfer Learning across Different Feature Spaces

Through experiments on the text-aided image classification and cross-language classification tasks, it is demonstrated that the translated learning framework can greatly outperform many state-of-the-art baseline methods.

The De-democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research

It is suggested that a lack of access to specialized equipment such as compute can de-democratize knowledge production, and increase concerns around bias and fairness within AI technology, and presents an obstacle towards "democratizing" AI.

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure.