• Corpus ID: 238215873

Learning Models for Actionable Recourse

@inproceedings{Ross2021LearningMF,
  title={Learning Models for Actionable Recourse},
  author={Alexis Ross and Himabindu Lakkaraju and Osbert Bastani},
  booktitle={NeurIPS},
  year={2021}
}
As machine learning models are increasingly deployed in high-stakes domains such as legal and financial decision-making, there has been growing interest in post-hoc methods for generating counterfactual explanations. Such explanations provide individuals adversely impacted by predicted outcomes (e.g., an applicant denied a loan) with recourse—i.e., a description of how they can change their features to obtain a positive outcome. We propose a novel algorithm that leverages adversarial training… 

Figures and Tables from this paper

On the Adversarial Robustness of Causal Algorithmic Recourse
TLDR
This work forms the adversarially robust recourse problem and shows that methods that offer minimally costly recourse fail to be robust, and derives bounds on the extra cost incurred by individuals seeking robust recourse.
CounterNet: End-to-End Training of Counterfactual Aware Predictions
TLDR
The results show that CounterNet generates high-quality predictions, and corresponding CF explanations (with well-balanced cost-invalidity trade-offs) for any new input instance significantly faster than existing state-of-the-art baselines.
Don't Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis
TLDR
EVA (Explaining using Verified perturbation Analysis) is introduced – the first explainability method guarantee to have an exhaustive exploration of a perturgation space and leverages the beneficial properties of verified perturbations analysis to efficiently characterize the input variables that are most likely to drive the model decision.
Counterfactual Explanations for Natural Language Interfaces
TLDR
This work proposes a novel approach for generating explanations of a natural language interface based on semantic parsing that focuses on counterfactual explanations, which are post-hoc explanations that describe to the user how they could have minimally modified their utterance to achieve their desired goal.

References

SHOWING 1-10 OF 41 REFERENCES
Model-Agnostic Counterfactual Explanations for Consequential Decisions
TLDR
This work builds on standard theory and tools from formal verification and proposes a novel algorithm that solves a sequence of satisfiability problems, where both the distance function (objective) and predictive model (constraints) are represented as logic formulae.
Explaining and Harnessing Adversarial Examples
TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
Equality of Opportunity in Supervised Learning
TLDR
This work proposes a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features and shows how to optimally adjust any learned predictor so as to remove discrimination according to this definition.
Learning Cost-Effective and Interpretable Treatment Regimes
TLDR
This work proposes a novel objective to construct a decision list which maximizes outcomes for the population, and minimizes overall costs, and employs a variant of the Upper Confidence Bound for Trees strategy which leverages customized checks for pruning the search space effectively.
Actionable Recourse in Linear Classification
TLDR
An integer programming toolkit is presented to measure the feasibility and difficulty of recourse in a target population, and generate a list of actionable changes for a person to obtain a desired outcome, and illustrate how recourse can be significantly affected by common modeling practices.
Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods
TLDR
It is demonstrated how extremely biased (racist) classifiers crafted by the proposed framework can easily fool popular explanation techniques such as LIME and SHAP into generating innocuous explanations which do not reflect the underlying biases.
Explaining machine learning classifiers through diverse counterfactual explanations
TLDR
This work proposes a framework for generating and evaluating a diverse set of counterfactual explanations based on determinantal point processes, and provides metrics that enable comparison ofcounterfactual-based methods to other local explanation methods.
Interpretable classification models for recidivism prediction
TLDR
A recent method called supersparse linear integer models is used to produce accurate, transparent and interpretable scoring systems along the full ROC curve, which can be used for decision making for many different use cases.
Interpreting Blackbox Models via Model Extraction
TLDR
A novel algorithm for extracting decision tree explanations that actively samples new training points to avoid overfitting is devised and several insights provided by the interpretations are described, including a causal issue validated by a physician.
Generating Natural Adversarial Examples
TLDR
This paper proposes a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks.
...
1
2
3
4
5
...