• Publications
  • Influence
Online Double Oracle
TLDR
This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large and ODO is rationale in the sense that each agent in ODO can exploit strategic adversary with a regret bound of O.
Last Round Convergence and No-Instant Regret in Repeated Games with Asymmetric Information
TLDR
A no-instant-regret algorithm for the column player to exhibit last round convergence to a minimax equilibrium is developed and shown to be efficient against a large set of popular no- Regret algorithms of the row player.
Last Round Convergence and No-Dynamic Regret in Asymmetric Repeated Games
TLDR
A no-dynamic regret algorithm for the column player to exhibit last round convergence to a minimax equilibrium is developed and it is shown that this algorithm is efficient against a large set of popular no-regret algorithms the row player can use.
Exploiting No-Regret Algorithms in System Design
TLDR
This work proposes a new game playing algorithm for the system designer and proves that it can guide the row player, who may play a \emph{stable} no-regret algorithm, to converge to a minimax solution.
How to Guide a Non-Cooperative Learner to Cooperate: Exploiting No-Regret Algorithms in System Design
TLDR
This work investigates a repeated two-player game setting where the column player is also a designer of the system, and has full control over payoff matrices, and proposes a novel zero-sum game construction whose unique minimax solution contains the desired behaviour.
Online Markov Decision Processes with Non-oblivious Strategic Adversary
TLDR
This work demonstrates that MDPExpert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of O( √), and studies a novel setting in Online Markov Decision Processes where the loss function is chosen by a nonoblivious strategic adversary who follows a no-external regret algorithm.
Playing Coopetitive Polymatrix Games with Small Manipulation Cost
Iterated coopetitive games capture the situation when one must efficiently balance between cooperation and competition with the other agents over time in order to win the game (e.g., to become the
Online Learning against Strategic Adversary
TLDR
A no-dynamic regret algorithm for the column player to exhibit last round convergence to a minimax equilibrium and is efficient against a large set of popular no-regret algorithms the row player can use, including the multiplicative weights update algorithm, general follow-the-regularized-leader and any no- Regret algorithms satisfy a property so called “stability”.
Playing Repeated Coopetitive Polymatrix Games with Small Manipulation Cost
TLDR
This paper proposes a payoff matrix manipulation scheme and sequence of strategies for an agent that provably guarantees that the utility of any opponent would converge to a value the authors desire, and designs winning policies for this agent.