# Learning in Stackelberg Games with Non-myopic Agents

@article{Haghtalab2022LearningIS, title={Learning in Stackelberg Games with Non-myopic Agents}, author={Nika Haghtalab and Thodoris Lykouris and Sloan Nietert and Alexander Wei}, journal={Proceedings of the 23rd ACM Conference on Economics and Computation}, year={2022} }

Stackelberg games are a canonical model for strategic principal-agent interactions. Consider, for instance, a defense system that distributes its security resources across high-risk targets prior to attacks being executed; or a tax policymaker who sets rules on when audits are triggered prior to seeing filed tax reports; or a seller who chooses a price prior to knowing a customer's proclivity to buy. In each of these scenarios, a principal first selects an action x∈X and then an agent reacts…

## References

SHOWING 1-10 OF 52 REFERENCES

### Computing the optimal strategy to commit to

- Computer ScienceEC '06
- 2006

This paper studies how to compute optimal strategies to commit to under both commitment to pure strategies and commitment to mixed strategies, in both normal-form and Bayesian games.

### Commitment Without Regrets: Online Learning in Stackelberg Security Games

- Computer ScienceEC
- 2015

This work designs no-regret algorithms whose regret (when compared to the best fixed strategy in hindsight) is polynomial in the parameters of the game, and sublinear in the number of times steps.

### Contextual search in the presence of irrational agents

- Computer ScienceSTOC
- 2021

We study contextual search, a generalization of binary search in higher dimensions, which captures settings such as feature-based dynamic pricing. Standard game-theoretic formulations of this problem…

### Learning and Approximating the Optimal Strategy to Commit To

- Computer ScienceSAGT
- 2009

This work considers the computation of optimal Stackelberg strategies in general two-player Bayesian games, given that all the payoffs and the prior distribution over types are known.

### Learning Auctions with Robust Incentive Guarantees

- EconomicsNeurIPS
- 2019

We study the problem of learning Bayesian-optimal revenue-maximizing auctions. The classical approach to maximizing revenue requires a known prior distribution on the demand of the bidders, although…

### Stackelberg vs. Nash in Security Games: An Extended Investigation of Interchangeability, Equivalence, and Uniqueness

- Computer ScienceJ. Artif. Intell. Res.
- 2011

It is shown that the Nash equilibria in security games are interchangeable, thus alleviating the equilibrium selection problem and proposed an extensive-form game model that makes the defender's uncertainty about the attacker's ability to observe explicit.

### Finite-time Analysis of the Multiarmed Bandit Problem

- Computer ScienceMachine Learning
- 2004

This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

### Dynamic Incentive-Aware Learning: Robust Pricing in Contextual Auctions

- EconomicsNeurIPS
- 2019

This work proposes two learning policies that are robust to strategic behavior in repeated contextual second-price auctions and uses the outcomes of the auctions, rather than the submitted bids, to estimate the preferences while controlling the long-term effect of the outcome of each auction on the future reserve prices.

### Delay and Cooperation in Nonstochastic Bandits

- Computer ScienceCOLT
- 2016

This work introduces EXP3-COOP, a cooperative version of the EXP3 algorithm, and proves that with K actions and N agents the average per-agent regret after T rounds is at most of order q d + 1 + K d (T lnK), where d is the independence number of the d-th power of the communication graphG.

### Learning Optimal Reserve Price against Non-myopic Bidders

- Economics, Computer ScienceNeurIPS
- 2018

Algorithms are introduced that obtain small regret against non-myopic bidders either when the market is large, i.e., no bidder appears in a constant fraction of the rounds, or when the b biders are impatient, which means they discount future utility by some factor mildly bounded away from one.