An Optimal Algorithm for Linear Bandits


We provide the first algorithm for online bandit linear optimization whose regret after T rounds is of order √ Td lnN on any finite class X ⊆ R of N actions, and of order d √ T (up to log factors) when X is infinite. These bounds are not improvable in general. The basic idea utilizes tools from convex geometry to construct what is essentially an optimal… (More)


Cite this paper

@article{CesaBianchi2011AnOA, title={An Optimal Algorithm for Linear Bandits}, author={Nicol{\`o} Cesa-Bianchi and Sham M. Kakade}, journal={CoRR}, year={2011}, volume={abs/1110.4322} }