• Corpus ID: 90237357

# Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits

@article{Li2019NearlyMR,
title={Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits},
author={Yingkai Li and Yining Wang and Yuanshuo Zhou},
journal={ArXiv},
year={2019},
volume={abs/1904.00242}
}
• Published 30 March 2019
• Computer Science
• ArXiv
We study the linear contextual bandit problem with finite action sets. When the problem dimension is $d$, the time horizon is $T$, and there are $n \leq 2^{d/2}$ candidate actions per time period, we (1) show that the minimax expected regret is $\Omega(\sqrt{dT (\log T) (\log n)})$ for every algorithm, and (2) introduce a Variable-Confidence-Level (VCL) SupLinUCB algorithm whose regret matches the lower bound up to iterated logarithmic factors. Our algorithmic result saves two $\sqrt{\log T… ## Tables from this paper • Computer Science ArXiv • 2023 It is shown that the classical LinUCB algorithm -- designed for the realizable case -- is automatically robust against gap-adjusted misspecification, which achieves a near-optimal$\sqrt{T}$regret for problems that the best-known regret is almost linear in time horizon$T$. • Computer Science ArXiv • 2022 This work proposes a novel contextual bandit algorithm for generalized linear rewards with an$\tilde{O}(\sqrt{\kappa^{-1} \phi T})$regret over$T$rounds where$\phi$is the minimum eigenvalue of the covariance of contexts and$\kappa$is a lower bound of the variance of rewards. • Computer Science AISTATS • 2021 Regret upper bound of O(\sqrt{d^2T\log T})\times \mathrm{poly}(\log\ log T)$ is proved where d is the domain dimension and $T$ is the time horizon.
• Computer Science
ArXiv
• 2021
Novel analyses that improve their regret bounds significantly are presented that critically relies on a novel peeling-based regret analysis that leverages the elliptical potential `count' lemma.
• Computer Science
ArXiv
• 2022
The first variance-aware regret guarantee for sparse linear bandits is presented, and two recent algorithms are taken as black boxes to illustrate that the claimed bounds indeed hold, where the first algorithm can handle unknown-variance cases and the second one is more efficient.
• Computer Science
ArXiv
• 2023
A variance-adaptive algorithm for linear mixture MDPs is proposed, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret for deterministic MDP's.
• Computer Science
• 2022
The theoretical analysis demonstrates that for Bernoulli multi-armed bandits, EBUCB can achieve the optimal regret order if the inference error measured by two different $\alpha$-divergences is less than a constant, regardless of how large this constant is.
• Computer Science
ArXiv
• 2020
Improved fixed-design confidence bounds for the linear logistic model are proposed by leveraging the self-concordance of the logistic loss inspired by Faury et al. (2020) and improving upon previous state-of-the-art performance guarantees.
• Computer Science, Mathematics
ArXiv
• 2021
This work shows how to construct variance-aware confidence sets for linear bandits and linear mixture Markov Decision Process and obtains the first regret bound that only scales logarithmically with H in the reinforcement learning with linear function approximation setting, thus exponentially improving existing results.
• Computer Science
COLT
• 2021
A new Bernstein-type concentration inequality for self-normalized martingales for linear bandit problems with bounded noise and a new, computationally efficient algorithm with linear function approximation named UCRL-VTR for the aforementioned linear mixture MDPs in the episodic undiscounted setting are proposed.

## References

SHOWING 1-10 OF 39 REFERENCES

• Computer Science
COLT
• 2018
This work develops several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with sophisticated statistical tests so as to dynamically adapt to a change in distribution.
• Computer Science
Period. Math. Hung.
• 2010
For this modified UCB algorithm, an improved bound on the regret is given with respect to the optimal reward for K-armed bandits after T trials.
• Computer Science
ArXiv
• 2018
This self-contained contribution simultaneously presents state-of-the-art techniques for regret minimization in bandit models, and an elementary construction of non-asymptotic confidence bounds based on the empirical likelihood method for bounded distributions.
• Computer Science, Mathematics
COLT
• 2017
The gap-entropy conjecture is made, and for any Gaussian Best-$1$-Arm instance with gaps of the form $2^{-k}$, any $\delta$-correct monotone algorithm requires $\Omega\left(H(I))\cdot\left(\ln\delta^{-1} + \mathsf{Ent}(I)\right)$ samples in expectation.
• Computer Science, Mathematics
COLT
• 2008
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.
• Computer Science, Mathematics
J. Mach. Learn. Res.
• 2003
This work considers the Multi-armed bandit problem under the PAC (“probably approximately correct”) model and generalizes the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the Arms are not.
• Computer Science
AISTATS
• 2011
An O (√ Td ln (KT ln(T )/δ) ) regret bound is proved that holds with probability 1− δ for the simplest known upper confidence bound algorithm for this problem.
• Computer Science
ICML
• 2013
Two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given atarget budget of arm pulls, are presented, for which upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters are proved.
This paper introduces the first strategy for stochastic bandits with unit variance Gaussian noise that is simultaneously minimax optimal up to constant factors, asymptotically optimal, and never
• Economics, Computer Science
Found. Trends Mach. Learn.
• 2012
The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs.