• Corpus ID: 15936963

Structured Stochastic Linear Bandits ( DRAFT )

@inproceedings{Johnson2016StructuredSL,
  title={Structured Stochastic Linear Bandits ( DRAFT )},
  author={Nicholas Johnson and Vidyashankar Sivakumar and Arindam Banerjee},
  year={2016}
}
In this paper, we consider the structured stochastic linear bandit problem which is a sequential decision making problem where at each round t the algorithm has to select a p-dimensional vector xt from a convex set after which it observes a loss `t(xt). We assume the loss is a linear function of the vector and an unknown parameter θ∗. We consider the problem when θ∗ is structured which we characterize as having a small value according to some norm, e.g., s-sparse, group-sparse, etc. We… 

Tables from this paper

References

SHOWING 1-10 OF 32 REFERENCES
Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit
TLDR
This paper combines ideas from Compressed Sensing and Bandit Theory and derive an algorithm with a regret bound in O(S p n), an application to the problem of optimizing a function that depends on many variables but among which only a small number of them are relevant.
Stochastic Linear Optimization under Bandit Feedback
TLDR
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.
Sparsity Regret Bounds for Individual Sequences in Online Linear Regression
TLDR
The notion of sparsity regret bound is introduced, which is a deterministic online counterpart of recent risk bounds derived in the stochastic setting under a sparsity scenario and is proved for an online-learning algorithm called SeqSEW and based on exponential weighting and data-driven truncation.
Contextual Bandits with Linear Payoff Functions
TLDR
An O (√ Td ln (KT ln(T )/δ) ) regret bound is proved that holds with probability 1− δ for the simplest known upper confidence bound algorithm for this problem.
The Price of Bandit Information for Online Optimization
TLDR
This paper presents an algorithm which achieves O*(n3/2 √T) regret and presents lower bounds showing that this gap is at least √n, which is conjecture to be the correct order.
Improved Algorithms for Linear Stochastic Bandits
TLDR
A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.
A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
TLDR
A unified framework for establishing consistency and convergence rates for regularized M-estimators under high-dimensional scaling is provided and one main theorem is state and shown how it can be used to re-derive several existing results, and also to obtain several new results.
Beating the adaptive bandit with high probability
We provide a principled way of proving Õ(√T) high-probability guarantees for partial-information (bandit) problems over arbitrary convex decision sets. First, we prove a regret guarantee for the
Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary
TLDR
This paper gives an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala, for the case of an adaptive adversary, and achieves a regret bound of \(\mathcal{O}(T^3}{4}}\sqrt{ln(T)})).
High-Probability Regret Bounds for Bandit Online Linear Optimization
TLDR
This paper eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings, and improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary.
...
...