# A Research Agenda: Dynamic Models to Defend Against Correlated Attacks

@article{Goodfellow2019ARA, title={A Research Agenda: Dynamic Models to Defend Against Correlated Attacks}, author={I. Goodfellow}, journal={ArXiv}, year={2019}, volume={abs/1903.06293} }

In this article I describe a research agenda for securing machine learning models against adversarial inputs at test time. This article does not present results but instead shares some of my thoughts about where I think that the field needs to go. Modern machine learning works very well on I.I.D. data: data for which each example is drawn {\em independently} and for which the distribution generating each example is {\em identical}. When these assumptions are relaxed, modern machine learning can… Expand

#### Topics from this paper

#### Paper Mentions

#### 12 Citations

Testing Robustness Against Unforeseen Adversaries

- Computer Science, Mathematics
- ArXiv
- 2019

This work introduces a total of four novel adversarial attacks to create ImageNet-UA's diverse attack suite, and demonstrates that, in comparison to Image net-UA, prevailing L_inf robustness assessments give a narrow account of model robustness. Expand

Adaptive Generation of Unrestricted Adversarial Inputs

- Computer Science
- 2019

This work introduces a novel algorithm for generating unrestricted adversarial inputs which is adaptive: it is able to tune its attacks to the classifier being targeted, and offers a 400-2,000x speedup over the existing state of the art. Expand

Hidden Incentives for Auto-Induced Distributional Shift

- Computer Science, Mathematics
- ArXiv
- 2020

The term auto-induced distributional shift (ADS) is introduced to describe the phenomenon of an algorithm causing a change in the distribution of its own inputs, to ensure that machine learning systems do not leverage ADS to increase performance when doing so could be undesirable. Expand

H IDDEN INCENTIVES FOR SELF-INDUCED DISTRIBUTIONAL SHIFT

- 2019

Decisions made by machine learning systems have increasing influence on the world. Yet it is common for machine learning algorithms to assume that no such influence exists. An example is the use of… Expand

Fighting Gradients with Gradients: Dynamic Defenses against Adversarial Attacks

- Computer Science
- ArXiv
- 2021

Dent improves the robustness of adversarially-trained defenses and nominally-trained models against white-box, black- box, and adaptive attacks on CIFAR-10/100 and ImageNet and proposes dynamic defenses, to adapt the model and input during testing, by defensive entropy minimization (dent). Expand

Generating Realistic Unrestricted Adversarial Inputs using Dual-Objective GAN Training

- Computer Science, Mathematics
- ArXiv
- 2019

This work introduces a novel algorithm to generate realistic unrestricted adversarial inputs, in the sense that they cannot reliably be distinguished from the training dataset by a human, and finds that human judges are unable to identify which image out of ten was generated by the method about 50 percent of the time. Expand

Closeness and Uncertainty Aware Adversarial Examples Detection in Adversarial Machine Learning

- Computer Science
- ArXiv
- 2020

This work explores and assess the usage of 2 different groups of metrics in detecting adversarial samples: the ones based on the uncertainty estimation using Monte-Carlo Dropout Sampling and the ones which are based on closeness measures in the subspace of deep features extracted by the model. Expand

Towards Adversarial Robustness via Transductive Learning

- Computer Science
- ArXiv
- 2021

This paper formalize and analyze modeling aspects of transductive robustness, and proposes the principle of attacking model space for solving bilevel attack objectives, and presents an instantiation of the principle which breaks previous transductIVE defenses. Expand

Anomalous Instance Detection in Deep Learning: A Survey

- Computer Science, Mathematics
- ArXiv
- 2020

A taxonomy for existing techniques based on their underlying assumptions and adopted approaches is provided and various techniques in each of the categories are discussed and the relative strengths and weaknesses of the approaches are provided. Expand

Robust Semantic Segmentation by Redundant Networks With a Layer-Specific Loss Contribution and Majority Vote

- Computer Science
- 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
- 2020

This work proposes in this work a novel error detection and correction scheme with application to semantic segmentation that obtains its robustnesss by an online-adapted and therefore hard-to-attack student DNN during vehicle operation, which builds upon a novel layer-dependent inverse feature matching (IFM) loss. Expand

#### References

SHOWING 1-10 OF 23 REFERENCES

Motivating the Rules of the Game for Adversarial Example Research

- Computer Science, Mathematics
- ArXiv
- 2018

It is argued that adversarial example defense papers have, to date, mostly considered abstract, toy games that do not relate to any specific security concern, and a taxonomy of motivations, constraints, and abilities for more plausible adversaries is established. Expand

Towards Deep Learning Models Resistant to Adversarial Attacks

- Computer Science, Mathematics
- ICLR
- 2018

This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee. Expand

Adversarial examples in the physical world

- Computer Science, Mathematics
- ICLR
- 2017

It is found that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera, which shows that even in physical world scenarios, machine learning systems are vulnerable to adversarialExamples. Expand

Unrestricted Adversarial Examples

- Mathematics, Computer Science
- ArXiv
- 2018

This work introduces a two-player contest for evaluating the safety and robustness of machine learning systems, with a large prize pool, and shifts the focus to unconstrained adversaries. Expand

Certified Defenses against Adversarial Examples

- Computer Science, Mathematics
- ICLR
- 2018

This work proposes a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value, providing an adaptive regularizer that encourages robustness against all attacks. Expand

Provable defenses against adversarial examples via the convex outer adversarial polytope

- Computer Science, Mathematics
- ICML
- 2018

A method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations, and it is shown that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. Expand

Training verified learners with learned verifiers

- Computer Science, Mathematics
- ArXiv
- 2018

Experiments show that the predictor-verifier architecture able to train networks to achieve state of the art verified robustness to adversarial examples with much shorter training times can be scaled to produce the first known verifiably robust networks for CIFAR-10. Expand

Delving into Transferable Adversarial Examples and Black-box Attacks

- Computer Science
- ICLR
- 2017

This work is the first to conduct an extensive study of the transferability over large models and a large scale dataset, and it is also theFirst to study the transferabilities of targeted adversarial examples with their target labels. Expand

On Evaluating Adversarial Robustness

- Mathematics, Computer Science
- ArXiv
- 2019

The methodological foundations are discussed, commonly accepted best practices are reviewed, and new methods for evaluating defenses to adversarial examples are suggested. Expand

Detecting Adversarial Samples from Artifacts

- Computer Science, Mathematics
- ArXiv
- 2017

This paper investigates model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model, and results show a method for implicit adversarial detection that is oblivious to the attack algorithm. Expand