Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
@article{He2022NearlyOA, title={Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions}, author={Jiafan He and Dongruo Zhou and Tong Zhang and Quanquan Gu}, journal={ArXiv}, year={2022}, volume={abs/2205.06811} }
We study the linear contextual bandit problem in the presence of adversarial corruption, where the reward at each round is corrupted by an adversary, and the corruption level (i.e., the sum of corruption magnitudes over the horizon) is C ≥ 0. The best-known algorithms in this setting are limited in that they either are computationally inefficient or require a strong assumption on the corruption, or their regret is at least C times worse than the regret without corruption. In this paper, to…
Tables from this paper
5 Citations
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
- Computer ScienceArXiv
- 2022
A novel technique to control the sum of weighted uncertainty is developed and this algorithm achieves regret bounds either nearly match the performance lower bound or improve the existing methods for all the corruption levels and in both known and unknown ζ cases.
Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs
- Computer ScienceArXiv
- 2022
This paper proposes the first computationally efficient horizon-free algorithm for linear mixture MDPs, which achieves the optimal (cid:101) O ( d √ K + d 2 ) regret up to logarithmic factors.
Learning Stochastic Shortest Path with Linear Function Approximation
- Computer ScienceICML
- 2022
A novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which provably achieves an near-optimal regret guarantee and proves a lower bound of Ω( dB (cid:63) √ K ) .
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes
- Mathematics, Computer Science
- 2022
This work proposes the first computationally computationally efficient algorithm that achieves the nearly minimax optimal regret for episodic time-inhomogeneous linear Markov decision processes (linear MDPs).
Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms
- Computer ScienceArXiv
- 2022
Surprisingly, in this paper, it is shown that the stochastic contextual problem can be solved as if it is a linear bandit problem, and a novel reduction framework is established that converts every stoChastic contextuallinear bandit instance to a linearBandit instance, when the context distribution is known.
References
SHOWING 1-10 OF 30 REFERENCES
Linear Contextual Bandits with Adversarial Corruptions
- Computer ScienceArXiv
- 2021
A gap-dependent regret bound is proved for the proposed algorithm, which is instance-dependent and thus leads to better performance on good practical instances and is the first variance-aware corruption-robust algorithm for contextual bandits.
Stochastic Linear Bandits Robust to Adversarial Attacks
- Computer ScienceAISTATS
- 2021
In a contextual setting, a setup of diverse contexts is revisited, and it is shown that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
Stochastic bandits robust to adversarial corruptions
- Computer ScienceSTOC
- 2018
We introduce a new model of stochastic bandits with adversarial corruptions which aims to capture settings where most of the input follows a stochastic pattern but some fraction of it can be…
One Practical Algorithm for Both Stochastic and Adversarial Bandits
- Computer ScienceICML
- 2014
The algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm, and retains "logarithmic" regret guarantee in the stochastic regime even when some observations are contaminated by an adversary.
An Optimal Algorithm for Stochastic and Adversarial Bandits
- Computer ScienceAISTATS
- 2019
The proposed algorithm enjoys improved regret guarantees in two intermediate regimes: the moderately contaminated stochastic regime defined by Seldin and Slivkins (2014) and the stochastically constrained adversary studied by Wei and Luo (2018).
Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks
- Computer ScienceAISTATS
- 2022
This work provides the first robust bandit algorithm for stochastic linear contextual bandit setting under a fully adaptive and omniscient attack with sub-linear regret and shows by experiments that the proposed algorithm improves the robustness against various kinds of popular attacks.
A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem
- Computer ScienceNeurIPS
- 2018
A smoothed analysis is given, showing that even when contexts may be chosen by an adversary, small perturbations of the adversary's choices suffice for the algorithm to achieve "no regret", perhaps (depending on the specifics of the setting) with a constant amount of initial training data.
Misspecified Linear Bandits
- Computer ScienceAAAI
- 2017
A novel bandit algorithm is developed, comprising a hypothesis test for linearity followed by a decision to use either the OFUL or Upper Confidence Bound (UCB) algorithm, which provably exhibits OFULs favorable regret performance, while for misspecified models satisfying the non-sparse deviation property, the algorithm avoids the linear regret phenomenon and falls back on UCBs sublinear regret scaling.
Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously
- Computer ScienceICML
- 2021
This work develops linear bandit algorithms that automatically adapt to different environments and additionally enjoys minimax-optimal regret in completely adversarial environments, which is the first of this kind to the authors' knowledge.
Better Algorithms for Stochastic Bandits with Adversarial Corruptions
- Computer ScienceCOLT
- 2019
A new algorithm is presented whose regret is nearly optimal, substantially improving upon previous work and can tolerate a significant amount of corruption with virtually no degradation in performance.