Corpus ID: 158047135

A critique of the DeepSec Platform for Security Analysis of Deep Learning Models

  title={A critique of the DeepSec Platform for Security Analysis of Deep Learning Models},
  author={Nicholas Carlini},
At IEEE SP (2) evaluates attacks and defenses using incorrect implementations that greatly under-estimate their effectiveness; (3) evaluates the robustness of each defense as an average, not based on the most effective attack against that defense; (4) performs several statistical analyses incorrectly and fails to report variance; and, (5) as a result of these errors draws invalid conclusions and makes sweeping generalizations. 

Topics from this paper

Benchmarking Adversarial Robustness
A comprehensive, rigorous, and coherent benchmark to evaluate adversarial robustness on image classification tasks is established and several important findings are drawn and provide insights for future research. Expand
Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples
This work defines a set of quantitative indicators which unveil common failures in the optimization of gradient-based attacks, and proposes specific mitigation strategies within a systematic evaluation protocol, providing a first concrete step towards automatizing and systematizing current adversarial robustness evaluations. Expand
RobustBench: a standardized adversarial robustness benchmark
This work evaluates robustness of models for their benchmark with AutoAttack, an ensemble of white- and black-box attacks which was recently shown in a large-scale study to improve almost all robustness evaluations compared to the original publications. Expand
Taking Care of The Discretization Problem: A Comprehensive Study of the Discretization Problem and A Black-Box Adversarial Attack in Discrete Integer Domain
This work proposes a black-box method which reduces the adversarial example searching problem to a derivative-free optimization problem and achieves significantly higher success rate in terms of adversarial examples in the discrete integer domain than recent black- box methods. Expand
Confusing and Detecting ML Adversarial Attacks with Injected Attractors through Watermarking.
A generic method is given that injects attractors from a watermark decoder into the victim model, which allows it to leverage on known watermarking schemes for scalability and robustness and provides explainability of the outcomes. Expand


DEEPSEC: A Uniform Platform for Security Analysis of Deep Learning Model
This paper presents the design, implementation, and evaluation of DEEPSEC, a uniform platform that aims to bridge the gap between comprehensive evaluation on adversarial attacks and defenses and demonstrates its capabilities and advantages as a benchmark platform which can benefit future adversarial learning research. Expand
On Evaluating Adversarial Robustness
The methodological foundations are discussed, commonly accepted best practices are reviewed, and new methods for evaluating defenses to adversarial examples are suggested. Expand
Towards Deep Learning Models Resistant to Adversarial Attacks
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee. Expand
Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods
It is concluded that adversarialExamples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not. Expand
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
This work identifies obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples, and develops attack techniques to overcome this effect. Expand
The Limitations of Deep Learning in Adversarial Settings
This work formalizes the space of adversaries against deep neural networks (DNNs) and introduces a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. Expand
Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples
This work proposes a novel adversarial sample detection technique for face recognition models, based on interpretability, that features a novel bi-directional correspondence inference between attributes and internal neurons to identify neurons critical for individual attributes. Expand
Explaining and Harnessing Adversarial Examples
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Expand
Technical Report on the CleverHans v2.1.0 Adversarial Examples Library
CleverHans is a software library that provides standardized reference implementations of adversarial example construction techniques and adversarial training. The library may be used to develop moreExpand
Adversarial examples in the physical world
It is found that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera, which shows that even in physical world scenarios, machine learning systems are vulnerable to adversarialExamples. Expand