• Corpus ID: 15936963

# Structured Stochastic Linear Bandits ( DRAFT )

@inproceedings{Johnson2016StructuredSL,
title={Structured Stochastic Linear Bandits ( DRAFT )},
author={Nicholas Johnson and Vidyashankar Sivakumar and Arindam Banerjee},
year={2016}
}
• Published 2016
• Computer Science, Mathematics
In this paper, we consider the structured stochastic linear bandit problem which is a sequential decision making problem where at each round t the algorithm has to select a p-dimensional vector xt from a convex set after which it observes a loss `t(xt). We assume the loss is a linear function of the vector and an unknown parameter θ∗. We consider the problem when θ∗ is structured which we characterize as having a small value according to some norm, e.g., s-sparse, group-sparse, etc. We…

## References

SHOWING 1-10 OF 32 REFERENCES
Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit
• Computer Science, Mathematics
AISTATS
• 2012
This paper combines ideas from Compressed Sensing and Bandit Theory and derive an algorithm with a regret bound in O(S p n), an application to the problem of optimizing a function that depends on many variables but among which only a small number of them are relevant.
Stochastic Linear Optimization under Bandit Feedback
• Computer Science, Mathematics
COLT
• 2008
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.
Sparsity Regret Bounds for Individual Sequences in Online Linear Regression
The notion of sparsity regret bound is introduced, which is a deterministic online counterpart of recent risk bounds derived in the stochastic setting under a sparsity scenario and is proved for an online-learning algorithm called SeqSEW and based on exponential weighting and data-driven truncation.
Contextual Bandits with Linear Payoff Functions
• Computer Science
AISTATS
• 2011
An O (√ Td ln (KT ln(T )/δ) ) regret bound is proved that holds with probability 1− δ for the simplest known upper confidence bound algorithm for this problem.
The Price of Bandit Information for Online Optimization
• Computer Science
NIPS
• 2007
This paper presents an algorithm which achieves O*(n3/2 √T) regret and presents lower bounds showing that this gap is at least √n, which is conjecture to be the correct order.
Improved Algorithms for Linear Stochastic Bandits
• Computer Science
NIPS
• 2011
A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.
A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
• Computer Science, Mathematics
NIPS
• 2009
A unified framework for establishing consistency and convergence rates for regularized M-estimators under high-dimensional scaling is provided and one main theorem is state and shown how it can be used to re-derive several existing results, and also to obtain several new results.
Beating the adaptive bandit with high probability
• Computer Science, Mathematics
2009 Information Theory and Applications Workshop
• 2009
We provide a principled way of proving Õ(√T) high-probability guarantees for partial-information (bandit) problems over arbitrary convex decision sets. First, we prove a regret guarantee for the