In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of trials so as to minimize the total cost of the chosen strategies.Expand

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of $n$ trials so as to maximize the total payoff of the chosen strategies.Expand

We consider price-setting algorithms for a simple market in which a seller has an unlimited supply of identical copies of some good, and interacts sequentially with a pool of n buyers, each of whom wants at most one copy of the good.Expand

We study a partial information online-learning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits).Expand

We present two algorithms whose reward is close to the information-theoretic optimum: one is based on a novel "balanced exploration" paradigm and the other is a primal-dual algorithm that uses multiplicative updates.Expand

Algorithmic pricing is the computational problem that sellers (e.g., supermarkets) face when trying to set prices for their items to maximize their profit in the presence of a known demand.Expand