Game Redesign in No-regret Game Playing

  title={Game Redesign in No-regret Game Playing},
  author={Yuzhe Ma and Young Wu and Xiaojin Zhu},
We study the game redesign problem in which an external designer has the ability to change the payoff function in each round, but incurs a design cost for deviating from the original game. The players apply no-regret learning algorithms to repeatedly play the changed games with limited feedback. The goals of the designer are to (i) incentivize players to take a specific target action profile frequently; (ii) incur small cumulative design cost. We present game redesign algorithms with the… 

Figures and Tables from this paper

Reward Poisoning Attacks on Offline Multi-Agent Reinforcement Learning
This work characterize the exact conditions under which the attacker can install a target policy and shows the need for robust MARL against adversarial attacks.
Adversary-Aware Learning Techniques and Trends in Cybersecurity
This dissertation focused on semi-Supervised Learning with Graphs and its applications in Knowledge Discovery and Data Mining.


A simple adaptive procedure leading to correlated equilibrium
We propose a new and simple adaptive procedure for playing a game: β€˜β€˜regret-matching.’’ In this procedure, players may depart from their current play with probabilities that are proportional to…
The paper introduces and studies the implementation of desirable outcomes by a reliable party who can not modify game rules (i.e. provide protocols), complementing previous work in mechanism design, while making it more applicable to many realistic CS settings.
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs.
Adversarial Attacks on Stochastic Bandits
An adversarial attack against two popular bandit algorithms: $\epsilon$-greedy and UCB, \emph{without} knowledge of the mean rewards is proposed, which means the attacker can easily hijack the behavior of the bandit algorithm to promote or obstruct certain actions.
Near Optimal Adversarial Attack on UCB Bandits
A novel attack strategy is proposed that manipulates a UCB principle into pulling some non-optimal target arm times with a cumulative cost that scales as $\sqrt{\log T}$, where $T$ is the number of rounds.
Stochastic Graphical Bandits with Adversarial Corruptions
This paper proposes an online algorithm that can utilize the stochastic pattern and also tolerate the adversarial corruptions and attains an O(Ξ± lnK lnT + Ξ±C) regret, where Ξ± is the independence number of the feedback graph, K is the number of arms, T is the time horizon, and C quantifies the total corruptions introduced by the adversary.
Introduction to Multi-Armed Bandits
This book provides a more introductory, textbook-like treatment of multi-armed bandits, providing a self-contained, teachable technical introduction and a brief review of the further developments.
Stochastic Linear Bandits Robust to Adversarial Attacks
In a contextual setting, a setup of diverse contexts is revisited, and it is shown that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack
This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks, and provides a high probability guarantee of O(log T) regret with respect to random rewards and random occurrence of attacks.
Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning
The results show that the attacker can easily succeed in teaching any target policy to the victim under mild conditions and highlight a significant security threat to reinforcement learning agents in practice.