• Corpus ID: 7380181

Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit

@inproceedings{Carpentier2012BanditTM,
  title={Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit},
  author={Alexandra Carpentier and R{\'e}mi Munos},
  booktitle={AISTATS},
  year={2012}
}
We consider a linear stochastic bandit problem where the dimension K of the unknown parameter is larger than the sampling budget n. In such cases, it is in general impossible to derive sub-linear regret bounds since usual linear bandit algorithms have a regret in O(K p n). In this paper we assume that is S sparse, i.e. has at most S non-zero components, and that the space of arms is the unit ball for the jj:jj2 norm. We combine ideas from Compressed Sensing and Bandit Theory and derive an… 

Figures from this paper

Structured Stochastic Linear Bandits ( DRAFT )
TLDR
This paper precisely characterize how the regret grows for any norm structure in terms of the Gaussian width and shows regret bounds which remove a √ p term.
Structured Stochastic Linear Bandits
TLDR
This paper focuses on constructing confidence ellipsoids which contain the unknown parameter across all rounds with high-probability, and shows the radius of such ellipseids depend on the Gaussian width of sets associated with the norm capturing the structure.
Sparse Stochastic Bandits
TLDR
This work considers the sparse case of the classical multi-armed bandit problem in the sense that only a small number of arms, namely s < d, have a positive expected reward and provides an algorithm whose regret scales with s instead of d.
High-Dimensional Sparse Linear Bandits
TLDR
A novel $\Omega(n^{2/3})$ dimension-free minimax regret lower bound is derived for sparse linear bandits in the data-poor regime where the horizon is smaller than the ambient dimension and where the feature vectors admit a well-conditioned exploration distribution.
High-Dimensional Gaussian Process Bandits
TLDR
The SI-BO algorithm is presented, which leverages recent low-rank matrix recovery techniques to learn the underlying subspace of the unknown function and applies Gaussian Process Upper Confidence sampling for optimization of the function.
Information Directed Sampling for Sparse Linear Bandits
TLDR
This work explores the use of information-directed sampling (IDS), which naturally balances the information-regret trade-off, and develops a class of informationtheoretic Bayesian regret bounds that nearly match existing lower bounds on a variety of problem instances.
Covariance-adapting algorithm for semi-bandits with application to sparse outcomes
TLDR
A new lower bound on the regret on this family of sub-exponential distributions is proved, that is parameterized by the unknown covariance matrix, a tighter quantity than the sub-Gaussian matrix.
Sparse linear contextual bandits via relevance vector machines
  • Davis Gilton, R. Willett
  • Computer Science
    2017 International Conference on Sampling Theory and Applications (SampTA)
  • 2017
TLDR
A novel approach that leverages ideas from linear Thompson sampling and relevance vector machines, resulting in a scalable approach that adapts to the unknown sparse support that exploits sparsity in the weight vector.
On Two Continuum Armed Bandit Problems in High Dimensions
TLDR
By placing suitable assumptions on the smoothness of the rewards, randomized algorithms are derived for both problems that achieve nearly optimal regret bounds in terms of the number of rounds n.
CBRAP: Contextual Bandits with RAndom Projection
TLDR
An algorithm of Contextual Bandits via RAndom Projection (CBRAP) in the setting of linear payoffs, which works especially for high-dimensional contextual data and proves an upper regret bound for the proposed algorithm, which is associated with reduced dimensions.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 25 REFERENCES
Linearly Parameterized Bandits
TLDR
It is proved that the regret and Bayes risk is of order Θ(r √T), by establishing a lower bound for an arbitrary policy, and showing that a matching upper bound is obtained through a policy that alternates between exploration and exploitation phases.
Stochastic Linear Optimization under Bandit Feedback
TLDR
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.
Parametric Bandits: The Generalized Linear Case
TLDR
The analysis highlights a key difficulty in generalizing linear bandit algorithms to the non-linear case, which is solved in GLM-UCB by focusing on the reward space rather than on the parameter space, and provides a tuning method based on asymptotic arguments, which leads to significantly better practical performance.
High-Probability Regret Bounds for Bandit Online Linear Optimization
TLDR
This paper eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings, and improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary.
Iterative Hard Thresholding for Compressed Sensing
Improved Algorithms for Linear Stochastic Bandits
TLDR
A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.
On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems
TLDR
This paper analyzes the situation where the distributions of rewards remain constant over epochs and change at unknown time instants and establishes a lower-bound for the regret in presence of abrupt changes in the arms reward distributions.
On Upper-Confidence Bound Policies for Switching Bandit Problems
TLDR
An upperbound for the expected regret is established by upper-bounding the expectation of the number of times suboptimal arms are played and it is shown that the discounted UCB and the sliding-window UCB both match the lower-bound up to a logarithmic factor.
Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization
TLDR
This work introduces an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O∗( √ T ) regret and presents a novel connection between online learning and interior point methods.
Restless Bandits, Partial Conservation Laws and Indexability
We show that if performance measures in a stochastic scheduling problem satisfy a set of so-called partial conservation laws (PCL), which extend previously studied generalized conservation laws
...
1
2
3
...