Variance-Aware Sparse Linear Bandits

@article{Dai2022VarianceAwareSL,
  title={Variance-Aware Sparse Linear Bandits},
  author={Yan Dai and Ruosong Wang and Simon Shaolei Du},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.13450}
}
It is well-known that the worst-case minimax regret for sparse linear bandits is e Θ (cid:16) √ dT (cid:17) where d is the ambient dimension and T is the number of time steps (ignoring the dependency on sparsity). On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an e O (1) regret, which is (nearly) independent of d and T . In this paper, we present the first variance-aware regret guarantee for sparse… 
1 Citations

Tables from this paper

Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms

Surprisingly, in this paper, it is shown that the stochastic contextual problem can be solved as if it is a linear bandit problem, and a novel reduction framework is established that converts every stoChastic contextuallinear bandit instance to a linearBandit instance, when the context distribution is known.

References

SHOWING 1-10 OF 56 REFERENCES

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Novel anal-yses that improve their regret bounds significantly are presented that rely on a novel peeling-based regret analysis that leverages the elliptical potential ‘count’ lemma.

Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit

This paper combines ideas from Compressed Sensing and Bandit Theory and derive an algorithm with a regret bound in O(S p n), an application to the problem of optimizing a function that depends on many variables but among which only a small number of them are relevant.

High-Dimensional Sparse Linear Bandits

A novel $\Omega(n^{2/3})$ dimension-free minimax regret lower bound is derived for sparse linear bandits in the data-poor regime where the horizon is smaller than the ambient dimension and where the feature vectors admit a well-conditioned exploration distribution.

Online Sparse Reinforcement Learning

A lower bound is provided showing that if the learner has oracle access to a policy that collects well-conditioned data then a variant of Lasso fitted Q-iteration enjoys a nearly dimension-free regret, which shows that in the large-action setting, the difficulty of learning can be attributed to the difficulties of finding a good exploratory policy.

Information Directed Sampling and Bandits with Heteroscedastic Noise

This work introduces a frequentist regret framework, similar to the Bayesian analysis of Russo and Van Roy (2014), and proves a new high-probability regret bound for general, possibly randomized policies, depending on a quantity the authors call the regret-information ratio.

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

Under this framework, an algorithm is designed that constructs the variance-aware confidence set based on empirical risk minimization and proves a variance-dependent regret bound for generalized linear bandits, and an algorithm based on follow-the-regularized-leader (FTRL) subroutine and online-to-confidence-set conversion which can achieve a tighter variance- dependent regret under certain conditions.

Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits

This work provides the first inroad into a theoretical understanding of dynamic batch learning in high-dimensional sparse linear contextual bandits through a regret lower bound and provides a matching upper bound (up to log factors), thus prescribing an optimal scheme for this problem.

Stochastic Linear Optimization under Bandit Feedback

A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.

Information Directed Sampling for Sparse Linear Bandits

This work explores the use of information-directed sampling (IDS), which naturally balances the information-regret trade-off, and develops a class of informationtheoretic Bayesian regret bounds that nearly match existing lower bounds on a variety of problem instances.

Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP

This paper presents new variance-aware confidence sets for linear bandits and linear mixture Markov Decision Processes (MDPs) and develops three technical ideas that may be of independent interest: applica-tions of the peeling technique to both the input norm and the variance magnitude, a recursion-based estimator for the variance, and a new convex potential lemma that generalizes the seminal elliptical potentialLemma.
...