Fairness testing: testing software for discrimination

@article{Galhotra2017FairnessTT,
  title={Fairness testing: testing software for discrimination},
  author={Sainyam Galhotra and Yuriy Brun and Alexandra Meliou},
  journal={Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering},
  year={2017}
}
This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema… 

Figures from this paper

Coverage-Guided Fairness Testing
TLDR
The proposed Coverage-Guided Fairness Testing (CGFT) leverages combinatorial testing to generate an evenly-distributed test suite and shows an improvement in the number of unfairness found using CGFT compared to previous work.
Software Engineering for Fairness: A Case Study with Hyperparameter Optimization
TLDR
This paper shows that making fairness as a goal during hyperparameter optimization can (a) preserve the predictive power of a model learned from a data miner while also generating fairer results, which is the first application of hyperparameters optimization as a tool for software engineers to generate fairer software.
Software fairness
TLDR
It is argued that software fairness is analogous to software quality, and that numerous software engineering challenges in the areas of requirements, specification, design, testing, and verification need to be tackled to solve this problem.
Themis : Automatically Testing Software for Discrimination Rico
TLDR
Themis is presented, an automated test suite generator to measure two types of discrimination, including causal relationships between sensitive inputs and program behavior, and its effectiveness on open-source software.
Avoiding Discrimination with Counterfactual Distributions
TLDR
It is described how counterfactual distributions can be used to avoid discrimination between protected groups by identifying proxy variables to omit in training and building a preprocessor that can mitigate discrimination.
Automated Directed Fairness Testing
TLDR
It is shown that Aequitas effectively generates inputs to uncover fairness violation in all the subject classifiers and systematically improves the fairness of respective models using the generated test inputs.
Two Kinds of Discrimination in AI-Based Penal Decision-Making
TLDR
This paper distinguishes two kinds of discrimination that need to be addressed in this context, related to the well-known problem of inevitable trade-offs between incompatible accounts of statistical fairness and the specific standards of discursive fairness that apply when basing human decisions on empirical evidence.
Black box fairness testing of machine learning models
TLDR
This work proposes a methodology for auto-generation of test inputs, for the task of detecting individual discrimination, which combines two well-established techniques - symbolic execution and local explainability for effective test case generation.
Themis: automatically testing software for discrimination
TLDR
Themis is presented, an automated test suite generator to measure two types of discrimination, including causal relationships between sensitive inputs and program behavior, and its effectiveness on open-source software.
FlipTest: fairness testing via optimal transport
TLDR
Evaluating the approach on three case studies, it is shown that this provides a computationally inexpensive way to identify subgroups that may be harmed by model discrimination, including in cases where the model satisfies group fairness criteria.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 101 REFERENCES
Counterfactual Fairness
TLDR
This paper develops a framework for modeling fairness using tools from causal inference and demonstrates the framework on a real-world problem of fair prediction of success in law school.
JCrasher: an automatic robustness tester for Java
TLDR
JCrasher attempts to detect bugs by causing the program under test to ‘crash’, that is, to throw an undeclared runtime exception, to test the behavior of public methods under random data.
Fairness-Aware Classifier with Prejudice Remover Regularizer
TLDR
A regularization approach is proposed that is applicable to any prediction algorithm with probabilistic discriminative models and applied to logistic regression and empirically show its effectiveness and efficiency.
Handling Conditional Discrimination
TLDR
This work develops local techniques for handling conditional discrimination when one of the attributes is considered to be explanatory, and demonstrates that the new local techniques remove exactly the bad discrimination, allowing differences in decisions as long as they are explainable.
FairTest: Discovering Unwarranted Associations in Data-Driven Applications
TLDR
The unwarranted associations (UA) framework is introduced, a principled methodology for the discovery of unfair, discriminatory, or offensive user treatment in data-driven applications and instantiate the UA framework in FairTest, the first comprehensive tool that helps developers check data- driven applications for unfair user treatment.
The Oracle Problem in Software Testing: A Survey
TLDR
This paper provides a comprehensive survey of current approaches to the test oracle problem and an analysis of trends in this important area of software testing research and practice.
Feedback-Directed Random Test Generation
TLDR
Experimental results indicate that feedback-directed random test generation can outperform systematic and undirectedrandom test generation, in terms of coverage and error detection.
Preventing data errors with continuous testing
TLDR
Continuous data testing is presented, a low-overhead, delay-free technique that quickly identifies likely data errors and is implemented in the ConTest prototype for the PostgreSQL database management system.
Behavioral Execution Comparison: Are Tests Representative of Field Behavior?
TLDR
Differences between field and test executions—and in particular the finer-grained and more sophisticated ones that are measured using the authors' invariantbased model—can provide insight for developers and suggest a better method for measuring test suite quality.
Decision Theory for Discrimination-Aware Classification
TLDR
The first and second solutions exploit the reject option of probabilistic classifier(s) and the disagreement region of general classifier ensembles to reduce discrimination and relate both solutions with decision theory for better understanding of the process.
...
1
2
3
4
5
...