Benford's law: what does it say on adversarial images?

  title={Benford's law: what does it say on adversarial images?},
  author={Jo{\~a}o G. Zago and Fabio L. Baldissera and Eric A. Antonelo and Rodrigo T. Saad},
Convolutional neural networks (CNNs) are fragile to small perturbations in the input images. These networks are thus prone to malicious attacks that perturb the inputs to force a misclassification. Such slightly manipulated images aimed at deceiving the classifier are known as adversarial images. In this work, we investigate statistical differences between natural images and adversarial ones. More precisely, we show that employing a proper image transformation and for a class of adversarial… 

Figures and Tables from this paper



PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples

Adversarial perturbations of normal images are usually imperceptible to humans, but they can seriously confuse state-of-the-art machine learning models. What makes them so special in the eyes of

On Detecting Adversarial Perturbations

It is shown empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans.

Towards Deep Learning Models Resistant to Adversarial Attacks

This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.

On the (Statistical) Detection of Adversarial Examples

It is shown that statistical properties of adversarial examples are essential to their detection, and they are not drawn from the same distribution than the original data, and can thus be detected using statistical tests.

Adversarial Machine Learning at Scale

This research applies adversarial training to ImageNet and finds that single-step attacks are the best for mounting black-box attacks, and resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples.

Ensemble Adversarial Training: Attacks and Defenses

This work finds that adversarial training remains vulnerable to black-box attacks, where perturbations computed on undefended models are transferred to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step.

Explaining and Harnessing Adversarial Examples

It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.

Safety Verification of Deep Neural Networks

A novel automated verification framework for feed-forward multi-layer neural networks based on Satisfiability Modulo Theory (SMT) is developed, which defines safety for an individual decision in terms of invariance of the classification within a small neighbourhood of the original image.

Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks

The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs.

Sample Based Fast Adversarial Attack Method

A new fast black-box adversarial attack algorithm purely based on data samples that generates comparable adversarial samples much fast then classical attack algorithms is proposed.