Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit
@article{Tramr2015DiscoveringUA, title={Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit}, author={Florian Tram{\`e}r and Vaggelis Atlidakis and Roxana Geambasu and Daniel J. Hsu and Jean-Pierre Hubaux and Mathias Humbert and Ari Juels and Huang Lin}, journal={ArXiv}, year={2015}, volume={abs/1510.02377} }
In today's data-driven world, programmers routinely incorporate user data into complex algorithms, heuristics, and application pipelines. While often beneficial, this practice can have unintended and detrimental consequences, such as the discriminatory effects identified in Staples' online pricing algorithm and the racially offensive labels recently found in Google's image tagger. We argue that such effects are bugs that should be tested for and debugged in a manner similar to functionality…
Figures and Tables from this paper
29 Citations
Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs
- Computer ScienceCCS
- 2017
This paper presents a program analysis technique that detects instances of proxy use in a model, and provides a witness that identifies which parts of the corresponding program exhibit the behavior, and a normative judgment oracle that makes this inappropriateness determination for a given witness.
Auditing Data Provenance in Text-Generation Models
- Computer ScienceKDD
- 2019
A new model auditing technique is developed that helps users check if their data was used to train a machine learning model, and it is empirically shown that the method can successfully audit well-generalized models that are not overfitted to the training data.
Proxy Discrimination∗ in Data-Driven Systems Theory and Experiments with Machine Learnt Programs
- Computer Science
- 2017
A notion of proxy discrimination in data-driven systems, a class of properties indicative of bias, is formalized as the presence of protected class correlates that have causal influence on the system’s output.
Fides: Towards a Platform for Responsible Data Science
- Computer ScienceSSDBM
- 2017
A need for a data sharing and collaborative analytics platform with features to encourage (and in some cases, enforce) best practices at all stages of the data science lifecycle is seen, which is described as Fides, in the context of urban analytics, outlining a systems research agenda in responsible data science.
Algorithmic Bias: From Discrimination Discovery to Fairness-aware Data Mining
- Computer ScienceKDD
- 2016
The aim of this tutorial is to survey algorithmic bias, presenting its most common variants, with an emphasis on the algorithmic techniques and key ideas developed to derive efficient solutions.
Through the Data Management Lens: Experimental Analysis and Evaluation of Fair Classification
- Computer ScienceSIGMOD Conference
- 2022
A broad analysis of 13 fair classification approaches and additional variants is contributed, over their correctness, fairness, efficiency, scalability, robustness to data errors, sensitivity to underlying ML model, data efficiency, and stability using a variety of metrics and real-world datasets.
Synthetic Data for Social Good
- Computer ScienceArXiv
- 2017
Important use cases for synthetic data that challenge the state of the art in privacy-preserving data generation are discussed, and DataSynthesizer is described, a dataset generation tool that takes a sensitive dataset as input and generates a structurally and statistically similar synthetic dataset, with strong privacy guarantees, as output.
Carnegie Mellon University Department of Electrical and Computer Engineering “ Accountability for Privacy and Fairness Violations in Data-Driven Systems with Limited Access ” Ph . D . Thesis Prospectus
- Computer Science
- 2017
This doctoral thesis develops AdFisher as a general framework to perform information flow experiments on web systems and uses it to evaluate societal values like discrimination, transparency, and choice on Google’s advertising system.
Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries
- Political ScienceFront. Big Data
- 2019
A framework for identifying a broad range of menaces in the research and practices around social data is presented, including biases and inaccuracies at the source of the data, but also introduced during processing.
Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products
- BusinessAIES
- 2019
The audit design and structured disclosure procedure used in the Gender Shades study is outlined, and new performance metrics from targeted companies IBM, Microsoft and Megvii (Face++) on the Pilot Parliaments Benchmark (PPB) as of August 2018 are presented.
References
SHOWING 1-10 OF 86 REFERENCES
XRay: Enhancing the Web's Transparency with Differential Correlation
- Computer ScienceUSENIX Security Symposium
- 2014
XRay is developed, the first fine-grained, robust, and scalable personal data tracking system for the Web, which achieves high precision and recall by correlating data from a surprisingly small number of extra accounts.
The reusable holdout: Preserving validity in adaptive data analysis
- Computer ScienceScience
- 2015
A new approach for addressing the challenges of adaptivity based on insights from privacy-preserving data analysis is demonstrated, and how to safely reuse a holdout data set many times to validate the results of adaptively chosen analyses is shown.
Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination
- Computer ScienceArXiv
- 2014
AdFisher, an automated tool that explores how user behaviors, Google's ads, and Ad Settings interact, finds that the Ad Settings was opaque about some features of a user's profile, that it does provide some choice on ads, but that these choices can lead to seemingly discriminatory ads.
Automated Experiments on Ad Privacy Settings
- Computer ScienceProc. Priv. Enhancing Technol.
- 2015
AdFisher, an automated tool that explores how user behaviors, Google's ads, and Ad Settings interact, finds that the Ad Settings was opaque about some features of a user’s profile, that it does provide some choice on advertisements, and that these choices can lead to seemingly discriminatory ads.
Certifying and Removing Disparate Impact
- Computer ScienceKDD
- 2015
This work links disparate impact to a measure of classification accuracy that while known, has received relatively little attention and proposes a test for disparate impact based on how well the protected class can be predicted from the other attributes.
Fairness-Aware Classifier with Prejudice Remover Regularizer
- Computer ScienceECML/PKDD
- 2012
A regularization approach is proposed that is applicable to any prediction algorithm with probabilistic discriminative models and applied to logistic regression and empirically show its effectiveness and efficiency.
Exposing Inconsistent Web Search Results with Bobble
- Computer SciencePAM
- 2014
Bobble is presented, a Web browser extension that contemporaneously executes a user's Google search query from a variety of different world-wide vantage points under a range of different conditions, alerting the user to the extent of inconsistency present in the set of search results returned to them by Google.
Measuring Price Discrimination and Steering on E-commerce Web Sites
- Computer ScienceInternet Measurement Conference
- 2014
This paper develops a methodology for accurately measuring when price steering and discrimination occur and implements it for a variety of e-commerce web sites, and investigates the effect of user behaviors on personalization.
Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence
- Computer ScienceCCS
- 2015
Sunlight's default functioning strikes a balance to provide the first system that can diagnose targeting at fine granularity, at scale, and with solid statistical justification of its results, as well as running two measurement studies of targeting on the web, both the largest of their kind.
A Methodology for Direct and Indirect Discrimination Prevention in Data Mining
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2013
This paper discusses how to clean training data sets and outsourced data sets in such a way that direct and/or indirect discriminatory decision rules are converted to legitimate (nondiscriminatory) classification rules and proposes new techniques applicable for direct or indirect discrimination prevention individually or both at the same time.