• Publications
  • Influence
Fairness warnings and fair-MAML: learning fairly with minimal data
TLDR
We propose Fairness Warnings, a model-agnostic algorithm that provides interpretable boundary conditions for when a fairly trained model may not behave fairly on similar but slightly different tasks within a given domain. Expand
  • 5
  • 2
  • PDF
Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods
TLDR
We propose a novel scaffolding technique that effectively hides the biases of any given classifier by allowing an adversarial entity to craft an arbitrary desired explanation. Expand
  • 44
  • 1
  • PDF
How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods
TLDR
We demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable. Expand
  • 23
  • 1
  • PDF
Assessing the Local Interpretability of Machine Learning Models
TLDR
The increasing adoption of machine learning tools has led to calls for accountability via model interpretability. Expand
  • 16
  • PDF
Fair Meta-Learning: Learning How to Learn Fairly
TLDR
We introduce a fair meta-learning approach called Fair-MAML that allows practitioners to train fair machine learning models from only a few examples when data from related tasks is available. Expand
  • 1
  • PDF
How Much Should I Trust You? Modeling Uncertainty of Black Box Explanations
TLDR
We develop a novel set of tools for analyzing explanation uncertainty in a Bayesian framework to generate Bayesian versions of LIME and KernelSHAP that capture the uncertainty associated with each feature importance. Expand
Expert-Assisted Transfer Reinforcement Learning
Reinforcement Learning is concerned with developing machine learning approaches to answer the question: "What should I do?" Transfer Learning attempts to use previously trained Reinforcement LearningExpand
Differentially Private Language Models Benefit from Public Pre-training
TLDR
We study the feasibility of learning a language model which is simultaneously high-quality and privacy preserving by tuning a public base model on a private corpus, making the training of such models possible. Expand