• Corpus ID: 247292239

Uncertify: Attacks Against Neural Network Certification

  title={Uncertify: Attacks Against Neural Network Certification},
  author={Tobias Lorenz and Marta Z. Kwiatkowska and Mario Fritz},
A key concept towards reliable, robust, and safe AI systems is the idea to implement fallback strategies when predictions of the AI cannot be trusted. Certifiers for neural networks have made great progress towards provable robustness guarantees against evasion attacks using adversarial examples. These methods guarantee for some predictions that a certain class of manipulations or attacks could not have changed the outcome. For the remaining predictions without guarantees, the method abstains… 

Figures and Tables from this paper



SoK: Certified Robustness for Deep Neural Networks

This paper provides a taxonomy for the robustness verification and training approaches, and provides an open-sourced unified platform to evaluate 20+ representative verification and corresponding robust training approaches on a wide range of DNNs.

On Certifying Robustness against Backdoor Attacks via Randomized Smoothing

It is found that existing randomized smoothing methods have limited effectiveness at defending against backdoor attacks, which highlight the needs of new theory and methods to certify robustness againstbackdoor attacks.

Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

Fine-pruning is evaluated, a combination of pruning and fine-tuning, and it is shown that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a \(0.4\%\) drop in accuracy for clean (non-triggering) inputs.

How Robust are Randomized Smoothing based Defenses to Data Poisoning?

This work proposes a novel bilevel optimization based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers and highlights the importance of training-data quality in achieving high certified adversarial robustness.

Towards Deep Learning Models Resistant to Adversarial Attacks

This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.

Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems

To the best of the knowledge, Februus is the first backdoor defense method for operation at run-time capable of sanitizing Trojaned inputs without requiring anomaly detection methods, model retraining or costly labeled data.

Semidefinite relaxations for certifying robustness to adversarial examples

A new semidefinite relaxation for certifying robustness that applies to arbitrary ReLU networks is proposed and it is shown that this proposed relaxation is tighter than previous relaxations and produces meaningful robustness guarantees on three different foreign networks whose training objectives are agnostic to the proposed relaxation.

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks

This work presents the first robust and generalizable detection and mitigation system for DNN backdoor attacks, and identifies multiple mitigation techniques via input filters, neuron pruning and unlearning.

Model Agnostic Defence against Backdoor Attacks in Machine Learning

This article presents Neo, a model agnostic framework to detect and mitigate backdoor attacks in image classification ML models, and reveals that despite being a blackbox approach, Neo is more effective in thwarting backdoor attacks than the existing techniques.

Certified Robustness to Adversarial Examples with Differential Privacy

This paper presents the first certified defense that both scales to large networks and datasets and applies broadly to arbitrary model types, based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired privacy formalism.