Structured Stochastic Linear Bandits ( DRAFT )
@inproceedings{Johnson2016StructuredSL, title={Structured Stochastic Linear Bandits ( DRAFT )}, author={Nicholas Johnson and Vidyashankar Sivakumar and Arindam Banerjee}, year={2016} }
In this paper, we consider the structured stochastic linear bandit problem which is a sequential decision making problem where at each round t the algorithm has to select a p-dimensional vector xt from a convex set after which it observes a loss `t(xt). We assume the loss is a linear function of the vector and an unknown parameter θ∗. We consider the problem when θ∗ is structured which we characterize as having a small value according to some norm, e.g., s-sparse, group-sparse, etc. We…
Tables from this paper
References
SHOWING 1-10 OF 32 REFERENCES
Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit
- Computer Science, MathematicsAISTATS
- 2012
This paper combines ideas from Compressed Sensing and Bandit Theory and derive an algorithm with a regret bound in O(S p n), an application to the problem of optimizing a function that depends on many variables but among which only a small number of them are relevant.
Stochastic Linear Optimization under Bandit Feedback
- Computer Science, MathematicsCOLT
- 2008
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.
Sparsity Regret Bounds for Individual Sequences in Online Linear Regression
- Computer Science, MathematicsCOLT
- 2011
The notion of sparsity regret bound is introduced, which is a deterministic online counterpart of recent risk bounds derived in the stochastic setting under a sparsity scenario and is proved for an online-learning algorithm called SeqSEW and based on exponential weighting and data-driven truncation.
Contextual Bandits with Linear Payoff Functions
- Computer ScienceAISTATS
- 2011
An O (√ Td ln (KT ln(T )/δ) ) regret bound is proved that holds with probability 1− δ for the simplest known upper confidence bound algorithm for this problem.
The Price of Bandit Information for Online Optimization
- Computer ScienceNIPS
- 2007
This paper presents an algorithm which achieves O*(n3/2 √T) regret and presents lower bounds showing that this gap is at least √n, which is conjecture to be the correct order.
Improved Algorithms for Linear Stochastic Bandits
- Computer ScienceNIPS
- 2011
A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.
A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
- Computer Science, MathematicsNIPS
- 2009
A unified framework for establishing consistency and convergence rates for regularized M-estimators under high-dimensional scaling is provided and one main theorem is state and shown how it can be used to re-derive several existing results, and also to obtain several new results.
Beating the adaptive bandit with high probability
- Computer Science, Mathematics2009 Information Theory and Applications Workshop
- 2009
We provide a principled way of proving Õ(√T) high-probability guarantees for partial-information (bandit) problems over arbitrary convex decision sets. First, we prove a regret guarantee for the…
Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary
- Computer Science, MathematicsCOLT
- 2004
This paper gives an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala, for the case of an adaptive adversary, and achieves a regret bound of \(\mathcal{O}(T^3}{4}}\sqrt{ln(T)})).
High-Probability Regret Bounds for Bandit Online Linear Optimization
- Computer Science, MathematicsCOLT
- 2008
This paper eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings, and improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary.