# Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit

@inproceedings{Carpentier2012BanditTM, title={Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit}, author={Alexandra Carpentier and R{\'e}mi Munos}, booktitle={AISTATS}, year={2012} }

We consider a linear stochastic bandit problem where the dimension K of the unknown parameter is larger than the sampling budget n. In such cases, it is in general impossible to derive sub-linear regret bounds since usual linear bandit algorithms have a regret in O(K p n). In this paper we assume that is S sparse, i.e. has at most S non-zero components, and that the space of arms is the unit ball for the jj:jj2 norm. We combine ideas from Compressed Sensing and Bandit Theory and derive an…

## 83 Citations

Structured Stochastic Linear Bandits ( DRAFT )

- Computer Science, Mathematics
- 2016

This paper precisely characterize how the regret grows for any norm structure in terms of the Gaussian width and shows regret bounds which remove a √ p term.

Structured Stochastic Linear Bandits

- Computer Science, MathematicsArXiv
- 2016

This paper focuses on constructing confidence ellipsoids which contain the unknown parameter across all rounds with high-probability, and shows the radius of such ellipseids depend on the Gaussian width of sets associated with the norm capturing the structure.

Sparse Stochastic Bandits

- Computer ScienceCOLT
- 2017

This work considers the sparse case of the classical multi-armed bandit problem in the sense that only a small number of arms, namely s < d, have a positive expected reward and provides an algorithm whose regret scales with s instead of d.

High-Dimensional Sparse Linear Bandits

- Computer ScienceNeurIPS
- 2020

A novel $\Omega(n^{2/3})$ dimension-free minimax regret lower bound is derived for sparse linear bandits in the data-poor regime where the horizon is smaller than the ambient dimension and where the feature vectors admit a well-conditioned exploration distribution.

High-Dimensional Gaussian Process Bandits

- Computer ScienceNIPS
- 2013

The SI-BO algorithm is presented, which leverages recent low-rank matrix recovery techniques to learn the underlying subspace of the unknown function and applies Gaussian Process Upper Confidence sampling for optimization of the function.

Information Directed Sampling for Sparse Linear Bandits

- Computer ScienceNeurIPS
- 2021

This work explores the use of information-directed sampling (IDS), which naturally balances the information-regret trade-off, and develops a class of informationtheoretic Bayesian regret bounds that nearly match existing lower bounds on a variety of problem instances.

Covariance-adapting algorithm for semi-bandits with application to sparse outcomes

- Computer ScienceCOLT
- 2020

A new lower bound on the regret on this family of sub-exponential distributions is proved, that is parameterized by the unknown covariance matrix, a tighter quantity than the sub-Gaussian matrix.

Sparse linear contextual bandits via relevance vector machines

- Computer Science2017 International Conference on Sampling Theory and Applications (SampTA)
- 2017

A novel approach that leverages ideas from linear Thompson sampling and relevance vector machines, resulting in a scalable approach that adapts to the unknown sparse support that exploits sparsity in the weight vector.

On Two Continuum Armed Bandit Problems in High Dimensions

- Computer ScienceTheory of Computing Systems
- 2014

By placing suitable assumptions on the smoothness of the rewards, randomized algorithms are derived for both problems that achieve nearly optimal regret bounds in terms of the number of rounds n.

CBRAP: Contextual Bandits with RAndom Projection

- Computer ScienceAAAI
- 2017

An algorithm of Contextual Bandits via RAndom Projection (CBRAP) in the setting of linear payoffs, which works especially for high-dimensional contextual data and proves an upper regret bound for the proposed algorithm, which is associated with reduced dimensions.

## References

SHOWING 1-10 OF 25 REFERENCES

Linearly Parameterized Bandits

- Computer Science, MathematicsMath. Oper. Res.
- 2010

It is proved that the regret and Bayes risk is of order Θ(r √T), by establishing a lower bound for an arbitrary policy, and showing that a matching upper bound is obtained through a policy that alternates between exploration and exploitation phases.

Stochastic Linear Optimization under Bandit Feedback

- Computer Science, MathematicsCOLT
- 2008

A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.

Parametric Bandits: The Generalized Linear Case

- Computer ScienceNIPS
- 2010

The analysis highlights a key difficulty in generalizing linear bandit algorithms to the non-linear case, which is solved in GLM-UCB by focusing on the reward space rather than on the parameter space, and provides a tuning method based on asymptotic arguments, which leads to significantly better practical performance.

High-Probability Regret Bounds for Bandit Online Linear Optimization

- Computer Science, MathematicsCOLT
- 2008

This paper eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings, and improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary.

Improved Algorithms for Linear Stochastic Bandits

- Computer ScienceNIPS
- 2011

A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.

On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

- Computer Science
- 2008

This paper analyzes the situation where the distributions of rewards remain constant over epochs and change at unknown time instants and establishes a lower-bound for the regret in presence of abrupt changes in the arms reward distributions.

On Upper-Confidence Bound Policies for Switching Bandit Problems

- Computer ScienceALT
- 2011

An upperbound for the expected regret is established by upper-bounding the expectation of the number of times suboptimal arms are played and it is shown that the discounted UCB and the sliding-window UCB both match the lower-bound up to a logarithmic factor.

Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization

- Computer ScienceCOLT
- 2008

This work introduces an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O∗( √ T ) regret and presents a novel connection between online learning and interior point methods.

Restless Bandits, Partial Conservation Laws and Indexability

- Computer Science
- 2000

We show that if performance measures in a stochastic scheduling problem satisfy a set of so-called partial conservation laws (PCL), which extend previously studied generalized conservation laws…