Corpus ID: 80628335

A Research Agenda: Dynamic Models to Defend Against Correlated Attacks

@article{Goodfellow2019ARA,
  title={A Research Agenda: Dynamic Models to Defend Against Correlated Attacks},
  author={I. Goodfellow},
  journal={ArXiv},
  year={2019},
  volume={abs/1903.06293}
}
In this article I describe a research agenda for securing machine learning models against adversarial inputs at test time. This article does not present results but instead shares some of my thoughts about where I think that the field needs to go. Modern machine learning works very well on I.I.D. data: data for which each example is drawn {\em independently} and for which the distribution generating each example is {\em identical}. When these assumptions are relaxed, modern machine learning can… Expand

Paper Mentions

Testing Robustness Against Unforeseen Adversaries
TLDR
This work introduces a total of four novel adversarial attacks to create ImageNet-UA's diverse attack suite, and demonstrates that, in comparison to Image net-UA, prevailing L_inf robustness assessments give a narrow account of model robustness. Expand
Adaptive Generation of Unrestricted Adversarial Inputs
TLDR
This work introduces a novel algorithm for generating unrestricted adversarial inputs which is adaptive: it is able to tune its attacks to the classifier being targeted, and offers a 400-2,000x speedup over the existing state of the art. Expand
Hidden Incentives for Auto-Induced Distributional Shift
TLDR
The term auto-induced distributional shift (ADS) is introduced to describe the phenomenon of an algorithm causing a change in the distribution of its own inputs, to ensure that machine learning systems do not leverage ADS to increase performance when doing so could be undesirable. Expand
H IDDEN INCENTIVES FOR SELF-INDUCED DISTRIBUTIONAL SHIFT
Decisions made by machine learning systems have increasing influence on the world. Yet it is common for machine learning algorithms to assume that no such influence exists. An example is the use ofExpand
Fighting Gradients with Gradients: Dynamic Defenses against Adversarial Attacks
TLDR
Dent improves the robustness of adversarially-trained defenses and nominally-trained models against white-box, black- box, and adaptive attacks on CIFAR-10/100 and ImageNet and proposes dynamic defenses, to adapt the model and input during testing, by defensive entropy minimization (dent). Expand
Generating Realistic Unrestricted Adversarial Inputs using Dual-Objective GAN Training
TLDR
This work introduces a novel algorithm to generate realistic unrestricted adversarial inputs, in the sense that they cannot reliably be distinguished from the training dataset by a human, and finds that human judges are unable to identify which image out of ten was generated by the method about 50 percent of the time. Expand
Closeness and Uncertainty Aware Adversarial Examples Detection in Adversarial Machine Learning
TLDR
This work explores and assess the usage of 2 different groups of metrics in detecting adversarial samples: the ones based on the uncertainty estimation using Monte-Carlo Dropout Sampling and the ones which are based on closeness measures in the subspace of deep features extracted by the model. Expand
Towards Adversarial Robustness via Transductive Learning
  • Jiefeng Chen, Yang Guo, +4 authors S. Jha
  • Computer Science
  • ArXiv
  • 2021
TLDR
This paper formalize and analyze modeling aspects of transductive robustness, and proposes the principle of attacking model space for solving bilevel attack objectives, and presents an instantiation of the principle which breaks previous transductIVE defenses. Expand
Anomalous Instance Detection in Deep Learning: A Survey
TLDR
A taxonomy for existing techniques based on their underlying assumptions and adopted approaches is provided and various techniques in each of the categories are discussed and the relative strengths and weaknesses of the approaches are provided. Expand
Robust Semantic Segmentation by Redundant Networks With a Layer-Specific Loss Contribution and Majority Vote
TLDR
This work proposes in this work a novel error detection and correction scheme with application to semantic segmentation that obtains its robustnesss by an online-adapted and therefore hard-to-attack student DNN during vehicle operation, which builds upon a novel layer-dependent inverse feature matching (IFM) loss. Expand
...
1
2
...

References

SHOWING 1-10 OF 23 REFERENCES
Motivating the Rules of the Game for Adversarial Example Research
TLDR
It is argued that adversarial example defense papers have, to date, mostly considered abstract, toy games that do not relate to any specific security concern, and a taxonomy of motivations, constraints, and abilities for more plausible adversaries is established. Expand
Towards Deep Learning Models Resistant to Adversarial Attacks
TLDR
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee. Expand
Adversarial examples in the physical world
TLDR
It is found that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera, which shows that even in physical world scenarios, machine learning systems are vulnerable to adversarialExamples. Expand
Unrestricted Adversarial Examples
TLDR
This work introduces a two-player contest for evaluating the safety and robustness of machine learning systems, with a large prize pool, and shifts the focus to unconstrained adversaries. Expand
Certified Defenses against Adversarial Examples
TLDR
This work proposes a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value, providing an adaptive regularizer that encourages robustness against all attacks. Expand
Provable defenses against adversarial examples via the convex outer adversarial polytope
TLDR
A method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations, and it is shown that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. Expand
Training verified learners with learned verifiers
TLDR
Experiments show that the predictor-verifier architecture able to train networks to achieve state of the art verified robustness to adversarial examples with much shorter training times can be scaled to produce the first known verifiably robust networks for CIFAR-10. Expand
Delving into Transferable Adversarial Examples and Black-box Attacks
TLDR
This work is the first to conduct an extensive study of the transferability over large models and a large scale dataset, and it is also theFirst to study the transferabilities of targeted adversarial examples with their target labels. Expand
On Evaluating Adversarial Robustness
TLDR
The methodological foundations are discussed, commonly accepted best practices are reviewed, and new methods for evaluating defenses to adversarial examples are suggested. Expand
Detecting Adversarial Samples from Artifacts
TLDR
This paper investigates model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model, and results show a method for implicit adversarial detection that is oblivious to the attack algorithm. Expand
...
1
2
3
...