• Corpus ID: 231879652

Defense Against Reward Poisoning Attacks in Reinforcement Learning

@article{Banihashem2021DefenseAR,
  title={Defense Against Reward Poisoning Attacks in Reinforcement Learning},
  author={Kiarash Banihashem and Adish Kumar Singla and Goran Radanovic},
  journal={ArXiv},
  year={2021},
  volume={abs/2102.05776}
}
We study defense strategies against reward poisoning attacks in reinforcement learning. As a threat model, we consider attacks that minimally alter rewards to make the attacker’s target policy uniquely optimal under the poisoned rewards, with the optimality gap specified by an attack parameter. Our goal is to design agents that are robust against such attacks in terms of the worst-case utility w.r.t. the true, unpoisoned, rewards while computing their policies under the poisoned rewards. We… 

Figures and Tables from this paper

COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks
TLDR
This work proposes the first certification framework, COPA, to certify the number of poisoning trajectories that can be tolerated regarding different certification criteria, and proposes two certification criteria: per-state action stability and cumulative reward bound.
Policy Teaching in Reinforcement Learning via Environment Poisoning Attacks
TLDR
The results show that the attacker can easily succeed in teaching any target policy to the victim under mild conditions and highlight a significant security threat to reinforcement learning agents in practice.
A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits
TLDR
This work proposes a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants, and shows that the algorithm is robust against a variety of adversarial attacks.
Admissible Policy Teaching through Reward Design
TLDR
This paper shows that the reward design problem for admissible policy teaching is computationally challenging, and it is NP-hard to find an approximately optimal reward modification, and formulates a surrogate problem whose optimal solution approximates the optimal solution to the rewardDesign problem in this setting, but is more amenable to optimization techniques and analysis.
Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning
The success of machine learning is fueled by the increasing availability of computing power and large training datasets. The training data is used to learn new models or update existing ones,

References

SHOWING 1-10 OF 61 REFERENCES
Markov Decision Processes: Discrete Stochastic Dynamic Programming
  • M. Puterman
  • Computer Science
    Wiley Series in Probability and Statistics
  • 1994
TLDR
Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.
Robust Deep Reinforcement Learning against Adversarial Perturbations on Observations
TLDR
The proposed training procedure significantly improves the robustness of DQN and DDPG agents under a suite of strong white box attacks on observations, including a few novel attacks the authors specifically craft.
Policy Poisoning in Batch Reinforcement Learning and Control
TLDR
This work presents a unified framework for solving batch policy poisoning attacks, and instantiate the attack on two standard victims: tabular certainty equivalence learner in reinforcement learning and linear quadratic regulator in control.
Policy Teaching in Reinforcement Learning via Environment Poisoning Attacks
TLDR
The results show that the attacker can easily succeed in teaching any target policy to the victim under mild conditions and highlight a significant security threat to reinforcement learning agents in practice.
Adaptive Reward-Poisoning Attacks against Reinforcement Learning
TLDR
It is shown that under mild conditions, adaptive attacks can achieve the nefarious policy in steps polynomial in state-space size $|S|$, whereas non-adaptive attacks require exponential steps.
Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning
TLDR
The results show that the attacker can easily succeed in teaching any target policy to the victim under mild conditions and highlight a significant security threat to reinforcement learning agents in practice.
Stronger Data Poisoning Attacks Break Data Sanitization Defenses
TLDR
Three new attacks that can all bypass a broad range of data sanitization defenses are developed, including commonly-used anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition.
Corruption Robust Exploration in Episodic Reinforcement Learning
TLDR
This work provides the first sublinear regret guarantee which accommodates any deviation from purely i.i.d. transitions in the bandit-feedback model for episodic reinforcement learning, and derives results for both tabular and linear-function-approximation settings.
Robust Policy Gradient against Strong Data Corruption
TLDR
A Filtered Policy Gradient algorithm is developed that can tolerate even unbounded reward corruption and can find an O(ε)-optimal policy and is emphasized that FPG is the first that can achieve a meaningful learning guarantee when a constant fraction of episodes are corrupted.
Stochastic Linear Bandits Robust to Adversarial Attacks
TLDR
In a contextual setting, a setup of diverse contexts is revisited, and it is shown that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
...
1
2
3
4
5
...