Bayesian Persuasion in Sequential Decision-Making

@article{Gan2022BayesianPI,
  title={Bayesian Persuasion in Sequential Decision-Making},
  author={Jiarui Gan and Rupak Majumdar and Goran Radanovic and Adish Kumar Singla},
  journal={ArXiv},
  year={2022},
  volume={abs/2106.05137}
}
We study a dynamic model of Bayesian persuasion in sequential decision-making settings. An informed principal observes an external parameter of the world and advises an uninformed agent about actions to take over time. The agent takes actions in each time step based on the current state, the principal's advice/signal, and beliefs about the external parameter. The action of the agent updates the state according to a stochastic process. The model arises naturally in many applications, e.g., an… 

Figures from this paper

Automated Dynamic Mechanism Design

TLDR
It is shown that memoryless mechanisms, which are without loss of generality optimal in Markov decision processes without strategic behavior, do not provide a good solution for the Bayesian automated mechanism design problem, in terms of both optimality and computational tractability.

Sequential Decision Making With Information Asymmetry

We survey some recent results in sequential decision making under uncertainty, where there is an information asymmetry among the decision-makers. We consider two versions of the problem: persuasion

Forward-Looking Dynamic Persuasion for Pipeline Stochastic Bayesian Game: A Fixed-Point Alignment Principle

This paper studies a general-sum two-player pipeline stochastic game where each period is composed of two stages. The agents have uncertainty about the transition of the state which is characterized

Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning

TLDR
A provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles to prevent receiver's detrimental equilibrium behavior.

Fixed-Point Alignment: Incentive Bayesian Persuasion for Pipeline Stochastic Bayesian Game

This letter studies a general-sum pipeline stochastic game where each period is composed of two pipelined stages. The first stage is a cognitive decision-making in which each agent selects one of

Sequential Information Design: Learning to Persuade in the Dark

TLDR
This work studies the case in which the sender does not know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver, and proves a negative result: no learning algorithm can be persuasive.

References

SHOWING 1-10 OF 45 REFERENCES

Automated Dynamic Mechanism Design

TLDR
It is shown that memoryless mechanisms, which are without loss of generality optimal in Markov decision processes without strategic behavior, do not provide a good solution for the Bayesian automated mechanism design problem, in terms of both optimality and computational tractability.

Private Bayesian Persuasion with Sequential Games

TLDR
It is shown that, for games with two receivers, an optimal ex ante persuasive signaling scheme can be computed in polynomial time thanks to the novel algorithm proposed, based on the ellipsoid method.

Bayesian Exploration: Incentivizing Exploration in Bayesian Games

TLDR
The goal is to design a recommendation policy for the principal which respects agents' incentives and minimizes a suitable notion of regret, and shows how the principal can identify (and explore) all explorable actions, and use the revealed information to perform optimally.

Optimal dynamic information provision

Value-Based Policy Teaching with Active Indirect Elicitation

TLDR
This work provides a method for active indirect elicitation wherein the agent's reward function is inferred from observations about its response to incentives, and shows value-based policy teaching is NP-hard and provides a mixed integer program formulation.

Multi-Receiver Online Bayesian Persuasion

TLDR
A general online gradient descent scheme to handle online learning problems with a finite number of possible loss functions and proves a negative result: for any 0 < α ≤ 1, there is no polynomial-time no-α-regret algorithm when the sender’s utility function is supermodular or anonymous.

Online Bayesian Persuasion

TLDR
A hardness result is proved on the per-round running time required to achieve no-α-regret for any α < 1 and algorithms for the full and partial feedback models with regret bounds sublinear in the number of rounds and polynomial in the size of the instance are provided.

Policy teaching through reward function learning

TLDR
This paper considers the specific objective of inducing a pre-specified desired policy, and examines both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former and formulating an active, indirect elicitation method for the latter.

Complexity of Mechanism Design

TLDR
Focusing-on settings where side payments are not possible, it is shown that the mechanism design problem is NP-complete for deterministic mechanisms and if the authors allow randomized mechanisms, the mechanisms design problem becomes tractable.

Non-Cooperative Inverse Reinforcement Learning

TLDR
The non-cooperative inverse reinforcement learning (N-CIRL) formalism is introduced and the benefits of this formalism over the existing multi-agent IRL formalism are demonstrated via extensive numerical simulation in a novel cyber security setting.