Bandit Problems with Lévy Payoff Processes

Abstract

We study one-arm Lévy bandits in continuous time, which have one safe arm that yields a constant payoff s, and one risky arm that can be either of type High or Low; both types yield stochastic payoffs generated by a Lévy process. The expectation of the Lévy process when the arm is High is greater than s, and lower than s if the arm is Low. The decision maker (DM) has to choose, at any given time t, the fraction of resource over the time interval [t, t+dt) to be allocated to each arm. We show that under proper conditions on the Lévy processes, there is a unique optimal strategy, which is a cut-off strategy, and we provide an explicit formula for the cut-off and the corresponding expected payoff from the data of the problem. We also examine the case where the DM has incorrect prior over the type of the risky arm, and we calculate the expected payoff gained by a DM who plays the optimal strategy that corresponds to the incorrect prior. In addition, we study some applications of the results: (a) we show how to price information in one arm Lévy bandit problem, and (b) we investigate who fares better in one-arm bandit problems: an optimist who assigns a probability higher than the true probability to High, or a pessimist who assigns a probability lower than the true probability to High.

Cite this paper

@inproceedings{Cohen2008BanditPW, title={Bandit Problems with Lévy Payoff Processes}, author={Asaf Cohen and Eilon Solan}, year={2008} }