• Corpus ID: 235670132

Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent

  title={Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent},
  author={Oliver Bryniarski and Nabeel Hingun and Pedro Pachuca and Vincent Wang and Nicholas Carlini},
(a) to multiple simultaneous one at the of Projected Gradient Descent and Orthogonal Projected Gradient Descent , improved attack to adversarial examples this by orthogonalizing the when running standard gradient-based attacks. We use our technique to evade four state-of-the-art detection defenses, reducing their accuracy to 0% while maintaining a detection 

Figures from this paper

Effective and Inconspicuous Over-the-Air Adversarial Examples with Adaptive Filtering

This work demonstrates a novel audio-domain adversarial attack that modifies benign audio using an interpretable and differentiable parametric transformation - adaptive filtering, allowing adversaries to attack more effectively in challenging, real-world settings.

Increasing Confidence in Adversarial Robustness Evaluations

This paper proposes a test to identify weak attacks, and thus weak defense evaluations, and hopes that attack unit tests — such as the authors' — will be a major component in future robustness evaluations and increaseence in an empirical environment currently riddled with skepticism.

Detecting Adversarial Perturbations in Multi-Task Perception

A novel adversarial perturbation detection scheme based on multi-task perception of complex vision tasks (i.e., depth estimation and semantic segmentation) and develops a novel edge consistency loss between all three modalities, thereby improving their initial consistency which supports the detection scheme.

Be Your Own Neighborhood: Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning

A novel AE detection framework, named B EYOND, which can easily cooperate with the Adversarial Trained Classifier (ATC), achieving the state-of-the-art (SOTA) robustness accuracy and powered by the robust relation net built on SSL.

Post-breach Recovery: Protection against White-box Adversarial Examples for Leaked DNN Models

Neo, a new system that creates new versions of leaked models, alongside an inference time that detects and removes adversarial examples generated on previously leaked models is proposed, and demonstrates potential as a complement to DNN defenses in the wild.

What You See is Not What the Network Infers: Detecting Adversarial Examples Based on Semantic Contradiction

This paper proposes a novel AE detection framework based on the very nature of AEs, i.e., their semantic information is inconsistent with the discriminative features extracted by the target DNN model, and shows that ContraNet outperforms existing solutions by a large margin, especially under adaptive attacks.

White-Box Attacks on Hate-speech BERT Classifiers in German with Explicit and Implicit Character Level Defense

The adversarial robustness of Bidirectional Encoder Representations from Transformers (BERT) models for German datasets is analyzed and two novel NLP attacks are introduced, including a character-level and word-level attack.

Hindi/Bengali Sentiment Analysis Using Transfer Learning and Joint Dual Input Learning with Self Attention

This work explores how to effectively use deep neural networks in transfer learning and joint dual input learning settings to effectively classify sentiments and detect hate speech in Hindi and Bengali data.



MagNet: A Two-Pronged Defense against Adversarial Examples

MagNet, a framework for defending neural network classifiers against adversarial examples, is proposed and it is shown empirically that MagNet is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples.

On Adaptive Attacks to Adversarial Example Defenses

It is demonstrated that thirteen defenses recently published at ICLR, ICML and NeurIPS---and chosen for illustrative and pedagogical purposes---can be circumvented despite attempting to perform evaluations using adaptive attacks.

Detecting Adversarial Examples from Sensitivity Inconsistency of Spatial-Transform Domain

This work reveals that normal examples are insensitive to the fluctuations occurring at the highly-curved region of the decision boundary, while AEs typically designed over one single domain exhibit exorbitant sensitivity on such fluctuations, and designs another classifier with transformed decision boundary to detect AEs.

Detection Based Defense Against Adversarial Examples From the Steganalysis Point of View

Steganalysis can be applied to adversarial examples detection, and a method to enhance steganalysis features by estimating the probability of modifications caused by adversarial attacks is proposed.

Certified Robustness to Adversarial Examples with Differential Privacy

This paper presents the first certified defense that both scales to large networks and datasets and applies broadly to arbitrary model types, based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired privacy formalism.

Certified Defenses against Adversarial Examples

This work proposes a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value, providing an adaptive regularizer that encourages robustness against all attacks.

Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness

12 state-of-the-art defense models are examined, and it is found that models exploiting label smoothing easily cause imbalanced gradients, and on which attacks can decrease their PGD robustness by over 23%.

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function are proposed and combined with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness.

Adversarial Attacks on Neural Network Policies

This work shows existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies, even with small adversarial perturbations that do not interfere with human perception.

Detecting Adversarial Samples from Artifacts

This paper investigates model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model, and results show a method for implicit adversarial detection that is oblivious to the attack algorithm.