Corpus ID: 235212189

Fooling Partial Dependence via Data Poisoning

  title={Fooling Partial Dependence via Data Poisoning},
  author={Hubert Baniecki and Wojciech Kretowicz and P. Biecek},
Many methods have been developed to understand complex predictive models and high expectations are placed on post-hoc model explainability. It turns out that such explanations are not robust nor trustworthy, and they can be fooled. This paper presents techniques for attacking Partial Dependence (plots, profiles, PDP), which are among the most popular methods of explaining any predictive model trained on tabular data. We showcase that PD can be manipulated in an adversarial manner, which is… Expand

Figures and Tables from this paper

Do not explain without context: addressing the blind spot of model explanations
It is postulate that obtaining robust and useful explanations always requires supporting them with a broader context, and that many model explanations depend directly or indirectly on the choice of the referenced data distribution. Expand


Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods
It is demonstrated how extremely biased (racist) classifiers crafted by the proposed framework can easily fool popular explanation techniques such as LIME and SHAP into generating innocuous explanations which do not reflect the underlying biases. Expand
Interpretable Deep Learning under Fire
This work presents ADV^2, a new class of attacks that generate adversarial inputs not only misleading target DNNs but also deceiving their coupled interpretation models, and identifies the prediction-interpretation gap as one root cause of this vulnerability. Expand
Underspecification Presents Challenges for Credibility in Modern Machine Learning
This work shows the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain, and shows that this problem appears in a wide variety of practical ML pipelines. Expand
Fooling Neural Network Interpretations via Adversarial Model Manipulation
It is claimed that the stability of neural network interpretation method with respect to the authors' adversarial model manipulation is an important criterion to check for developing robust and reliable neural network interpretations method. Expand
Evaluating Explanation Methods for Deep Learning in Security
Criteria for comparing and evaluating explanation methods in the context of computer security is introduced and based on this criteria, six popular explanation methods are investigated and their utility in security systems for malware detection and vulnerability discovery is investigated. Expand
dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python
Dalex, a Python package which implements the model-agnostic interface for interactive model exploration, adopts the design crafted through the development of various tools for responsible machine learning; thus, it aims at the unification of the existing solutions. Expand
A Responsible Machine Learning Workflow with Focus on Interpretable Models, Post-hoc Explanation, and Discrimination Testing
A template workflow for machine learning applications that require high accuracy and interpretability and that mitigate risks of discrimination is provided to provide a viable approach for training and evaluating machine learning systems for high-stakes, human-centered, or regulated applications using common Python programming tools. Expand
DALEX: Explainers for Complex Predictive Models in R
  • P. Biecek
  • Computer Science
  • J. Mach. Learn. Res.
  • 2018
A consistent collection of explainers for predictive models, a.k.a. black boxes, based on a uniform standardized grammar of model exploration which may be easily extended. Expand
pdp: An R Package for Constructing Partial Dependence Plots
Complex nonparametric models—like neural networks, random forests, and support vector machines—are more common than ever in predictive analytics, especially when dealing with large observationalExpand
Interpretation of Neural Networks is Fragile
This paper systematically characterize the fragility of several widely-used feature-importance interpretation methods (saliency maps, relevance propagation, and DeepLIFT) on ImageNet and CIFAR-10 and extends these results to show that interpretations based on exemplars (e.g. influence functions) are similarly fragile. Expand