# Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation

@article{Huang2019AchievingVR, title={Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation}, author={Po-Sen Huang and Robert Stanforth and Johannes Welbl and Chris Dyer and Dani Yogatama and Sven Gowal and Krishnamurthy Dvijotham and Pushmeet Kohli}, journal={ArXiv}, year={2019}, volume={abs/1909.01492} }

Neural networks are part of many contemporary NLP systems, yet their empirical successes come at the price of vulnerability to adversarial attacks. Previous work has used adversarial training and data augmentation to partially mitigate such brittleness, but these are unlikely to find worst-case adversaries due to the complexity of the search space arising from discrete text perturbations. In this work, we approach the problem from the opposite direction: to formally verify a system’s robustness…

## Figures and Tables from this paper

## 95 Citations

Certified Robustness to Word Substitution Attack with Differential Privacy

- Computer ScienceNAACL
- 2021

This paper establishes the connection between DP and adversarial robustness for the first time in the text domain and proposes a conceptual exponential mechanism-based algorithm to formally achieve the robustness.

Quantifying Robustness to Adversarial Word Substitutions

- Computer ScienceArXiv
- 2022

A robustness metric with a rigorous statistical guarantee is introduced to measure the quantification of adversarial examples, which indicates the model’s susceptibility to perturbations outside the safe radius, and helps figure out why state-of-the-art models like BERT can be easily fooled by a few word substitutions, but generalize well in the presence of real-world noises.

Achieving Model Robustness through Discrete Adversarial Training

- Computer ScienceEMNLP
- 2021

Surprisingly, it is found that random sampling leads to impressive gains in robustness, outperforming the commonly-used offline augmentation, while leading to a speedup at training time of ~10x.

BERT is Robust! A Case Against Synonym-Based Adversarial Examples in Text Classification

- Computer ScienceArXiv
- 2021

This paper investigates four word substitution-based attacks on BERT and concludes that BERT is a lot more robust than research on attacks suggests.

COMBATING ADVERSARIAL TYPOS

- Computer Science
- 2019

Despite achieving excellent benchmark performance, state-of-the-art NLP models can still be easily fooled by adversarial perturbations such as typos. Previous heuristic defenses cannot guard against…

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack

- Computer ScienceEMNLP
- 2020

T3 generated adversarial texts can successfully manipulate the NLP models to output the targeted incorrect answer without misleading the human and have high transferability which enables the black-box attacks in practice.

Robust Encodings: A Framework for Combating Adversarial Typos

- Computer ScienceACL
- 2020

This work introduces robust encodings (RobEn), a simple framework that confers guaranteed robustness, without making compromises on model architecture, and instantiates RobEn to defend against a large family of adversarial typos.

SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions

- Computer ScienceACL
- 2020

This work proposes a certified robust method based on a new randomized smoothing technique, which constructs a stochastic ensemble by applying random word substitutions on the input sentences, and leverage the statistical properties of the ensemble to provably certify the robustness.

T3: Tree-Autoencoder Regularized Adversarial Text Generation for Targeted Attack

- Computer ScienceEMNLP
- 2020

T3 generated adversarial texts can successfully manipulate the NLP models to output the targeted incorrect answer without misleading the human and have high transferability which enables the black-box attacks in practice.

Certified Robustness Against Natural Language Attacks by Causal Intervention

- Computer ScienceICML
- 2022

Causal Intervention by Semantic Smoothing (CISS), a novel framework towards robustness against natural language attacks, learns causal effects p ( y | do ( x )) by smoothing in the latent semantic space to make robust predictions, which scales to deep architectures and avoids tedious construction of noise customized for speciﬁc attacks.

## References

SHOWING 1-10 OF 49 REFERENCES

On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models

- Computer ScienceArXiv
- 2018

This work shows how a simple bounding technique, interval bound propagation (IBP), can be exploited to train large provably robust neural networks that beat the state-of-the-art in verified accuracy and allows the largest model to be verified beyond vacuous bounds on a downscaled version of ImageNet.

Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples

- Computer ScienceAAAI
- 2020

This paper proposes a projected gradient method combined with group lasso and gradient regularization for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities.

Provable defenses against adversarial examples via the convex outer adversarial polytope

- Computer ScienceICML
- 2018

A method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations, and it is shown that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss.

Knowing When to Stop: Evaluation and Verification of Conformity to Output-Size Specifications

- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019

This paper develops an easy-to-compute differentiable proxy objective that can be used with gradient-based algorithms to find output-lengthening inputs and develops a verification approach to formally prove that the network cannot produce outputs greater than a certain length.

Formal Security Analysis of Neural Networks using Symbolic Intervals

- Computer ScienceUSENIX Security Symposium
- 2018

This paper designs, implements, and evaluates a new direction for formally checking security properties of DNNs without using SMT solvers, and leverages interval arithmetic to compute rigorous bounds on the DNN outputs, which is easily parallelizable.

Training verified learners with learned verifiers

- Computer ScienceArXiv
- 2018

Experiments show that the predictor-verifier architecture able to train networks to achieve state of the art verified robustness to adversarial examples with much shorter training times can be scaled to produce the first known verifiably robust networks for CIFAR-10.

Provably Minimally-Distorted Adversarial Examples

- Computer Science
- 2017

It is demonstrated that one of the recent ICLR defense proposals, adversarial retraining, provably succeeds at increasing the distortion required to construct adversarial examples by a factor of 4.2.

Towards Deep Learning Models Resistant to Adversarial Attacks

- Computer ScienceICLR
- 2018

This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.

Certified Defenses against Adversarial Examples

- Computer ScienceICLR
- 2018

This work proposes a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value, providing an adaptive regularizer that encourages robustness against all attacks.

Ground-Truth Adversarial Examples

- Computer ScienceArXiv
- 2017

Ground truths are constructed: adversarial examples with a provably-minimal distance from a given input point that can serve to assess the effectiveness of attack techniques and also of defense techniques, by computing the distance to the ground truths before and after the defense is applied, and measuring the improvement.