• Corpus ID: 61153607

The Odds are Odd: A Statistical Test for Detecting Adversarial Examples

@inproceedings{Roth2019TheOA,
  title={The Odds are Odd: A Statistical Test for Detecting Adversarial Examples},
  author={Kevin Roth and Yannic Kilcher and Thomas Hofmann},
  booktitle={ICML},
  year={2019}
}
We investigate conditions under which test statistics exist that can reliably detect examples, which have been adversarially manipulated in a white-box attack. These statistics can be easily computed and calibrated by randomly corrupting inputs. They exploit certain anomalies that adversarial attacks introduce, in particular if they follow the paradigm of choosing perturbations optimally under p-norm constraints. Access to the log-odds is the only requirement to defend models. We justify our… 

Figures and Tables from this paper

Are Odds Really Odd? Bypassing Statistical Detection of Adversarial Examples
TLDR
This paper develops a classifier-based adaptation of the statistical test method and shows that it improves the detection performance, and proposes Logit Mimicry Attack method to generate adversarial examples such that their logits mimic those of benign images.
Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them
TLDR
A hardness reduction between detection and classification of adversarial examples is proved and is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated (namely a highly robust and data-efflcient classi fier ).
Provably robust classification of adversarial examples with detection
TLDR
This paper proposes a new method for jointly training a provably robust classifier and detector, and shows that the method outperforms traditional IBP used in isolation, especially for large perturbation sizes.
Divide-and-Conquer Adversarial Detection
TLDR
This paper trains adversary-robust auxiliary detectors to discriminate in-class natural examples from adversarially crafted out-of-class examples, and demonstrates that with the novel training scheme their models learn significant more robust representation than ordinary adversarial training.
Attack Agnostic Detection of Adversarial Examples via Random Subspace Analysis
TLDR
This work presents a technique that utilizes properties of random projections to characterize the behavior of clean and adversarial examples across a diverse set of subspaces and demonstrates that this technique outperforms competing detection strategies while remaining truly agnostic to the attack strategy.
Random Projections for Adversarial Attack Detection
TLDR
This work presents a technique that makes use of special properties of random projections, whereby it can characterize the behavior of clean and adversarial examples across a diverse set of subspaces, and outperforms competing state of the art SOTA attack strategies while remaining truly agnostic to the attack method itself.
Defending Adversarial Attacks by Correcting logits
TLDR
This work purely relies on logits, the class scores before softmax, to detect and defend adversarial attacks, and trains a two-layer network trained on a mixed set of clean and perturbed logits with the goal being recovering the original prediction.
Increasing Confidence in Adversarial Robustness Evaluations
TLDR
This paper proposes a test to identify weak attacks, and thus weak defense evaluations, and hopes that attack unit tests — such as the authors' — will be a major component in future robustness evaluations and increaseence in an empirical environment currently riddled with skepticism.
Adversarial Example Detection in Deployed Tree Ensembles
TLDR
This work presents a novel method for adversarial detection of tree ensembles that works by analyzing an unseen example’s output configuration, which is the set of predictions made by an ensemble's constituent trees.
Adversarial Example Detection and Classification With Asymmetrical Adversarial Training
TLDR
This paper presents an adversarial example detection method that provides performance guarantee to norm constrained adversaries, and uses the learned class conditional generative models to define generative detection/classification models that are both robust and more interpretable.
...
...

References

SHOWING 1-10 OF 36 REFERENCES
Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods
TLDR
It is concluded that adversarialExamples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not.
Adversarial vulnerability for any classifier
TLDR
This paper derives fundamental upper bounds on the robustness to perturbation of any classification function, and proves the existence of adversarial perturbations that transfer well across different classifiers with small risk.
The best defense is a good offense: Countering black box attacks by predicting slightly wrong labels
TLDR
This work puts ourselves in the shoes of the defender and presents a method that can successfully avoid model theft by mounting a counter-attack againstBlack-Box attacks on machine learning models.
Detecting Adversarial Samples from Artifacts
TLDR
This paper investigates model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model, and results show a method for implicit adversarial detection that is oblivious to the attack algorithm.
The Space of Transferable Adversarial Examples
TLDR
It is found that adversarial examples span a contiguous subspace of large (~25) dimensionality, which indicates that it may be possible to design defenses against transfer-based attacks, even for models that are vulnerable to direct attacks.
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples
Adversarial perturbations of normal images are usually imperceptible to humans, but they can seriously confuse state-of-the-art machine learning models. What makes them so special in the eyes of
On Detecting Adversarial Perturbations
TLDR
It is shown empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans.
Evasion Attacks against Machine Learning at Test Time
TLDR
This work presents a simple but effective gradient-based approach that can be exploited to systematically assess the security of several, widely-used classification algorithms against evasion attacks.
On the (Statistical) Detection of Adversarial Examples
TLDR
It is shown that statistical properties of adversarial examples are essential to their detection, and they are not drawn from the same distribution than the original data, and can thus be detected using statistical tests.
Towards Deep Learning Models Resistant to Adversarial Attacks
TLDR
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.
...
...