• Corpus ID: 213183269

RAB: Provable Robustness Against Backdoor Attacks

  title={RAB: Provable Robustness Against Backdoor Attacks},
  author={Maurice Weber and Xiaojun Xu and Bojan Karlas and Ce Zhang and Bo Li},
Recent studies have shown that deep neural networks are highly vulnerable to adversarial attacks, including evasion and backdoor attacks. On the defense side, there have been intensive interests in provable robustness against evasion attacks, while lack of robustness guarantees against backdoor attacks. In this paper, we focus on certifying the model robustness against general threat models. We first provide a unified framework via randomized smoothing and show it can be instantiated to certify… 

Flareon: Stealthy any2any Backdoor Injection via Poisoned Augmentation

Flareon is proposed, a small, stealthy, seemingly harmless code modification that targets the data augmentation pipeline with motion-based triggers and assumes prior knowledge of the victim model architecture, training data.

Identifying a Training-Set Attack's Target Using Renormalized Influence Estimation

This work proposes the task of target identification, which determines whether a specific test instance is the target of a training-set attack, and builds on influence estimation, which quantifies each training instance's contribution to a model's prediction.

Deep Partition Aggregation: Provable Defense against General Poisoning Attacks

Deep Partition Aggregation (DPA), a certified defense against a general poisoning threat model, uses a semi-supervised learning algorithm as its base classifier model, and SS-DPA outperforms the existing certified defense for label-flipping attacks, establishing new state-of-the-art provable defenses against poison attacks.

Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review

This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning, and presents key areas for future research on the backdoor, such as empirical security evaluations from physical trigger attacks, and more efficient and practical countermeasures are solicited.

Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning

The success of machine learning is fueled by the increasing availability of computing power and large training datasets. The training data is used to learn new models or update existing ones,

Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space.

Chaos Theory and Adversarial Robustness

Neural Networks, being susceptible to adversarial attacks, should face a strict level of scrutiny before being deployed in critical or adversarial applications. This paper uses ideas from Chaos

Uncovering the Connection Between Differential Privacy and Certified Robustness of Federated Learning against Poisoning Attacks

This paper investigates both user-level and instance-level privacy of FL and proposes novel mechanisms to achieve improved instance- level privacy and proves the certified robustness of DPFL under a bounded number of adversarial users or instances.

Turning a Curse Into a Blessing: Enabling Clean-Data-Free Defenses by Model Inversion

An algorithmic framework that can mitigate potential security vulnerabilities in a pre-trained model when clean data from its training distribution is unavailable to the defender is introduced.

COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks

This work proposes the first certification framework, COPA, to certify the number of poisoning trajectories that can be tolerated regarding different certification criteria, and proposes two certification criteria: per-state action stability and cumulative reward bound.



Certified Adversarial Robustness via Randomized Smoothing

Strong empirical results suggest that randomized smoothing is a promising direction for future research into adversarially robust classification on smaller-scale datasets where competing approaches to certified $\ell_2$ robustness are viable, smoothing delivers higher certified accuracies.

Randomized Smoothing of All Shapes and Sizes

It is shown that with only label statistics under random input perturbations, randomized smoothing cannot achieve nontrivial certified accuracy against perturbation of $\ell_p$-norm $\Omega(\min(1, d^{\frac{1}{p} - 1}{2}}))$, when the input dimension $d$ is large.

Spectral Signatures in Backdoor Attacks

Spectral signatures are identified as a new property of all known backdoor attacks, which allows tools from robust statistics to thwart the attacks and is demonstrated the efficacy of these signatures in detecting and removing poisoned examples on real image sets and state of the art neural network architectures.

Certified Robustness to Adversarial Examples with Differential Privacy

This paper presents the first certified defense that both scales to large networks and datasets and applies broadly to arbitrary model types, based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired privacy formalism.

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

This work considers a new type of attacks, called backdoor attacks, where the attacker's goal is to create a backdoor into a learning-based authentication system, so that he can easily circumvent the system by leveraging the backdoor.

Deep Learning with Differential Privacy

This work develops new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy, and demonstrates that deep neural networks can be trained with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.

Certified Robustness to Label-Flipping Attacks via Randomized Smoothing

This work presents a unifying view of randomized smoothing over arbitrary functions, and uses this novel characterization to propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.

Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering

This work proposes a novel approach to backdoor detection and removal for neural networks that is the first methodology capable of detecting poisonous data crafted to insert backdoors and repairing the model that does not require a verified and trusted dataset.

Backdoor Attacks on Black-Box Ciphers Exploiting Low-Entropy Plaintexts

A new design is proposed that eliminates the need for known plaintext entirely and employs "data compression" as a basic tool for generating a hidden information channel, highlighting the need to only encrypt compressed strings when a block cipher with a secret design must be used.

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

It is shown that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has state-of-the-art performance on the user's training and validation samples, but behaves badly on specific attacker-chosen inputs.