• Corpus ID: 232478869

Model Selection's Disparate Impact in Real-World Deep Learning Applications

@article{Forde2021ModelSD,
  title={Model Selection's Disparate Impact in Real-World Deep Learning Applications},
  author={Jessica Zosa Forde and A. Feder Cooper and Kweku Kwegyir-Aggrey and Chris De Sa and Michael L. Littman},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.00606}
}
Algorithmic fairness has emphasized the role of biased data in automated decision outcomes. Recently, there has been a shift in attention to sources of bias that implicate fairness in other stages in the ML pipeline. We contend that one source of such bias, human preferences in model selection, remains under-explored in terms of its role in disparate impact across demographic groups. Using a deep learning model trained on real-world medical imaging data, we verify our claim empirically and… 

Figures and Tables from this paper

Achieving Downstream Fairness with Geometric Repair

It is argued that fairer classification outcomes can be produced through the development of setting-speci fic interventions, and it is shown that attaining distributional parity minimizes rate disparities across all thresholds in the up/downstream setting.

Emergent Unfairness in Algorithmic Fairness-Accuracy Trade-Off Research

The intended goal of this work may be to improve the fairness of machine learning models, but it is argued that unexamined, implicit assumptions can in fact result in emergent unfairness.

Hyperparameter Optimization Is Deceiving Us, and How to Stop It

A defense with guarantees against deception is proved, and it is shown that grid search is inherently deceptive, so the choice of hyperparameter subspace to search can deceive you.

Evaluation Beyond Task Performance: Analyzing Concepts in AlphaZero in Hex

This work investigates AlphaZero’s internal representations in the game of Hex using two evaluation techniques from natural language processing (NLP): model probing and behavioral tests, and finds that MCTS discovers concepts before the neural network learns to encode them.

Non-Determinism and the Lawlessness of Machine Learning Code

It is shown that the effects of non-determinism, and consequently its implications for the law, become clearer from the perspective of reasoning about ML outputs as distributions over possible outcomes, and this distributional viewpoint accounts for randomness by emphasizing the possible outcomes of ML.

A Survey of Fairness in Medical Image Analysis: Concepts, Algorithms, Evaluations, and Challenges

This paper gives a comprehensive and precise definition of fairness, followed by introducing currently used techniques in fairness issues in MedIA, and lists public medical image datasets that contain demographic attributes for facilitating the fairness research and summarize current algorithms concerning fairness in Media.

Towards a Standard for Identifying and Managing Bias in Artificial Intelligence

To successfully manage the risks of AI bias the authors must operationalize values and create new norms around how AI is built and deployed, according to experts in the area of Trustworthy and Responsible AI.

Non-Determinism and the Lawlessness of ML Code

It is demonstrated that ML code falls outside of the cyberlaw frame of treating “code as law,” as this frame assumes that code is deterministic, and where the law must do work to bridge the gap between its current individual-outcome focus and the distributional approach that is recommended.

Confronting Bias: BSA’s Framework to Build Trust in AI

The Office of Science and Technology Policy’s (OSTP) effort to create a “Bill of Rights for an Automated Society” and engage the public in “National Policymaking about AI and Equity” is commendable.

Accountability in an Algorithmic Society: Relationality, Responsibility, and Robustness in Machine Learning

This analysis brings together recent scholarship on relational accountability frameworks and discusses how the barriers present difficulties for instantiating a unified moral, relational framework in practice for data-driven algorithmic systems to uncover new challenges for accountability that these systems present.

References

SHOWING 1-10 OF 36 REFERENCES

CheXclusion: Fairness gaps in deep chest X-ray classifiers

It is demonstrated that TPR disparities exist in the state-of-the-art classifiers in all datasets, for all clinical tasks, and all subgroups, and that a multi-source dataset corresponds to the smallest disparities, suggesting one way to reduce bias.

Decoupled Classifiers for Group-Fair and Efficient Machine Learning

A simple and efficient decoupling technique is provided, which can be added on top of any black-box machine learning algorithm, to learn different classifiers for different groups.

Differential Privacy Has Disparate Impact on Model Accuracy

It is demonstrated that in the neural networks trained using differentially private stochastic gradient descent (DP-SGD), accuracy of DP models drops much more for the underrepresented classes and subgroups, resulting in a disparate reduction of model accuracy.

Deep Reinforcement Learning that Matters

Challenges posed by reproducibility, proper experimental techniques, and reporting procedures are investigated and guidelines to make future results in deep RL more reproducible are suggested.

On Empirical Comparisons of Optimizers for Deep Learning

In experiments, it is found that inclusion relationships between optimizers matter in practice and always predict optimizer comparisons, and that the popular adaptive gradient methods never underperform momentum or gradient descent.

Bringing the People Back In: Contesting Benchmark Machine Learning Datasets

The ways in which benchmark datasets in machine learning operate as infrastructure and four research questions for these datasets are described and described.

Selective Brain Damage: Measuring the Disparate Impact of Model Pruning

Removing PIE images from the test-set greatly improves top-1 accuracy for both pruned and non-pruned models, and shed light on previously unknown trade-offs, and suggest that a high degree of caution should be exercised before pruning is used in sensitive domains.

Individual predictions matter: Assessing the effect of data ordering in training fine-tuned CNNs for medical imaging

We reproduced the results of CheXNet with fixed hyperparameters and 50 different random seeds to identify 14 finding in chest radiographs (x-rays). Because CheXNet fine-tunes a pre-trained DenseNet,

Optimizer Benchmarking Needs to Account for Hyperparameter Tuning

Evaluating a variety of optimizers on an extensive set of standard datasets and architectures, the results indicate that Adam is the most practical solution, particularly in low-budget scenarios.

Emergent Unfairness: Normative Assumptions and Contradictions in Algorithmic Fairness-Accuracy Trade-Off Research

The intended goal of this work may be to improve the fairness of machine learning models, but it is argued that unexamined, implicit assumptions can in fact result in emergent unfairness.