Model Reconstruction from Model Explanations

@inproceedings{Milli2018ModelRF,
  title={Model Reconstruction from Model Explanations},
  author={Smitha Milli and Ludwig Schmidt and Anca D. Dragan and Moritz Hardt},
  booktitle={FAT},
  year={2018}
}
We show through theory and experiment that gradient-based explanations of a model quickly reveal the model itself. Our results speak to a tension between the desire to keep a proprietary model secret and the ability to offer model explanations. On the theoretical side, we give an algorithm that provably learns a two-layer ReLU network in a setting where the algorithm may query the gradient of the model with respect to chosen inputs. The number of queries is independent of the dimension and… CONTINUE READING
30
Twitter Mentions

Figures and Topics from this paper.

Citations

Publications citing this paper.
SHOWING 1-7 OF 7 CITATIONS

High-Fidelity Extraction of Neural Network Models

Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, Nicolas Papernot
  • ArXiv
  • 2019
VIEW 10 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Explainable Machine Learning in Deployment

Umang Bhatt, Alice Xiang, +7 authors Peter Eckersley
  • ArXiv
  • 2019
VIEW 2 EXCERPTS
CITES BACKGROUND

The Bouncer Problem: Challenges to Remote Explainability

Erwan Le Merrer, Gilles Tredan
  • ArXiv
  • 2019
VIEW 1 EXCERPT
CITES BACKGROUND

References

Publications referenced by this paper.