# Certifying Robustness to Programmable Data Bias in Decision Trees

@inproceedings{Meyer2021CertifyingRT, title={Certifying Robustness to Programmable Data Bias in Decision Trees}, author={Anna P. Meyer and Aws Albarghouthi and Loris D'antoni}, booktitle={Neural Information Processing Systems}, year={2021} }

Datasets can be biased due to societal inequities, human biases, underrepresentation of minorities, etc. Our goal is to certify that models produced by a learning algorithm are pointwise-robust to potential dataset biases. This is a challenging problem: it entails learning models for a large, or even infinite, number of datasets, ensuring that they all produce the same prediction. We focus on decision-tree learning due to the interpretable nature of the models. Our approach allows…

## 6 Citations

### Certifying Data-Bias Robustness in Linear Regression

- Computer ScienceArXiv
- 2022

This work presents a technique for certifying whether linear regression models are pointwise-robust to label bias in the training dataset, i.e., whether bounded perturbations to the labels of a training dataset result in models that change the prediction of test points.

### FARE: P ROVABLY F AIR R EPRESENTATION L EARNING

- Computer Science
- 2022

This work proposes Fairness with Restricted Encoders (FARE), the first FRL method with provable fairness guarantees and develops and applies a practical statistical procedure that computes a high-confidence upper bound on the unfairness of any downstream classifier.

### FARE: Provably Fair Representation Learning

- Computer ScienceArXiv
- 2022

This work proposes Fairness with Restricted Encoders (FARE), the first FRL method with provable fairness guarantees, and develops and applies a practical procedure that computes a high-conﬁdence upper bound on the unfairness of any downstream classi ﬁer.

### BagFlip: A Certified Defense against Data Poisoning

- Computer ScienceArXiv
- 2022

BagFlip is presented, a model-agnostic certiﬁed approach that can effectively defend against both trigger-less and backdoor attacks and is equal to or more effective than the state-of-the-art approaches fortrigger-less attacks and more effective for backdoor attacks.

### Crab: Learning Certifiably Fair Predictive Models in the Presence of Selection Bias

- Computer ScienceArXiv
- 2022

This research shows that C RAB -MX not only achieves performance comparable to the baselines but also allows perfect fairness by achieving zero equal opportunity difference.

### Proving Data-Poisoning Robustness in Decision Trees

- Computer ScienceCommunications of the ACM
- 2023

This work presents a sound verification technique based on abstract interpretation and implements it in a tool called Antidote, which abstractly trains decision trees for an intractably large space of possible poisoned datasets and can produce proofs that the corresponding prediction would not have changed had the training set been tampered with or not.

## References

SHOWING 1-10 OF 41 REFERENCES

### Proving data-poisoning robustness in decision trees

- Computer SciencePLDI
- 2020

This work presents a sound verification technique based on abstract interpretation and implements it in a tool called Antidote, which abstractly trains decision trees for an intractably large space of possible poisoned datasets and can produce proofs that, for a given input, the corresponding prediction would not have changed had the training set been tampered with or not.

### Certified Robustness to Label-Flipping Attacks via Randomized Smoothing

- Computer Science, MathematicsICML
- 2020

This work presents a unifying view of randomized smoothing over arbitrary functions, and uses this novel characterization to propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.

### Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks

- Computer ScienceAAAI
- 2021

This work proves the intrinsic certified robustness of bagging against data poisoning attacks and shows that bagging with an arbitrary base learning algorithm provably predicts the same label for a testing example when the number of modified, deleted, and/or inserted training examples is bounded by a threshold.

### Sever: A Robust Meta-Algorithm for Stochastic Optimization

- Computer ScienceICML
- 2019

This work introduces a new meta-algorithm that can take in a base learner such as least squares or stochastic gradient descent, and harden the learner to be resistant to outliers, and finds that in both cases it has substantially greater robustness than several baselines.

### Technical note: Bias and the quantification of stability

- Computer ScienceMachine Learning
- 2004

This paper introduces a method for quantifying stability, based on a measure of the agreement between concepts, and discusses the relationships among stability, predictive accuracy, and bias.

### Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks

- Computer ScienceNeurIPS
- 2019

This paper shows how to efficiently calculate and optimize an upper bound on the robust loss, which leads to state-of-the-art robust test error for boosted trees on MNIST (12.5% for $\epsilon_\infty=0.3$), FMNIST, and CIFAR-10 (74.7%).

### Ensuring Fairness Beyond the Training Data

- Computer ScienceNeurIPS
- 2020

This work develops classifiers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the training samples.

### Robust Decision Trees Against Adversarial Examples

- Computer ScienceICML
- 2019

The proposed algorithms can substantially improve the robustness of tree-based models against adversarial examples and present efficient implementations for classical information gain based trees as well as state-of-the-art tree boosting models such as XGBoost.

### Decision Tree Instability and Active Learning

- Computer ScienceECML
- 2007

A new measure of decision tree stability is introduced, and three aspects of active learning stability are defined, which are found to improve the stability and accuracy of C4.5 in the active learning setting.

### Robustness meets algorithms

- Computer Science, MathematicsCommun. ACM
- 2021

This work gives the first efficient algorithm for estimating the parameters of a high-dimensional Gaussian that is able to tolerate a constant fraction of corruptions that is independent of the dimension.