Learning in Stackelberg Games with Non-myopic Agents

  title={Learning in Stackelberg Games with Non-myopic Agents},
  author={Nika Haghtalab and Thodoris Lykouris and Sloan Nietert and Alexander Wei},
  journal={Proceedings of the 23rd ACM Conference on Economics and Computation},
Stackelberg games are a canonical model for strategic principal-agent interactions. Consider, for instance, a defense system that distributes its security resources across high-risk targets prior to attacks being executed; or a tax policymaker who sets rules on when audits are triggered prior to seeing filed tax reports; or a seller who chooses a price prior to knowing a customer's proclivity to buy. In each of these scenarios, a principal first selects an action x∈X and then an agent reacts… 



Computing the optimal strategy to commit to

This paper studies how to compute optimal strategies to commit to under both commitment to pure strategies and commitment to mixed strategies, in both normal-form and Bayesian games.

Commitment Without Regrets: Online Learning in Stackelberg Security Games

This work designs no-regret algorithms whose regret (when compared to the best fixed strategy in hindsight) is polynomial in the parameters of the game, and sublinear in the number of times steps.

Contextual search in the presence of irrational agents

We study contextual search, a generalization of binary search in higher dimensions, which captures settings such as feature-based dynamic pricing. Standard game-theoretic formulations of this problem

Learning and Approximating the Optimal Strategy to Commit To

This work considers the computation of optimal Stackelberg strategies in general two-player Bayesian games, given that all the payoffs and the prior distribution over types are known.

Learning Auctions with Robust Incentive Guarantees

We study the problem of learning Bayesian-optimal revenue-maximizing auctions. The classical approach to maximizing revenue requires a known prior distribution on the demand of the bidders, although

Stackelberg vs. Nash in Security Games: An Extended Investigation of Interchangeability, Equivalence, and Uniqueness

It is shown that the Nash equilibria in security games are interchangeable, thus alleviating the equilibrium selection problem and proposed an extensive-form game model that makes the defender's uncertainty about the attacker's ability to observe explicit.

Finite-time Analysis of the Multiarmed Bandit Problem

This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

Dynamic Incentive-Aware Learning: Robust Pricing in Contextual Auctions

This work proposes two learning policies that are robust to strategic behavior in repeated contextual second-price auctions and uses the outcomes of the auctions, rather than the submitted bids, to estimate the preferences while controlling the long-term effect of the outcome of each auction on the future reserve prices.

Delay and Cooperation in Nonstochastic Bandits

This work introduces EXP3-COOP, a cooperative version of the EXP3 algorithm, and proves that with K actions and N agents the average per-agent regret after T rounds is at most of order q d + 1 + K d (T lnK), where d is the independence number of the d-th power of the communication graphG.

Learning Optimal Reserve Price against Non-myopic Bidders

Algorithms are introduced that obtain small regret against non-myopic bidders either when the market is large, i.e., no bidder appears in a constant fraction of the rounds, or when the b biders are impatient, which means they discount future utility by some factor mildly bounded away from one.