Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

  title={Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions},
  author={Jiafan He and Dongruo Zhou and Tong Zhang and Quanquan Gu},
We study the linear contextual bandit problem in the presence of adversarial corruption, where the reward at each round is corrupted by an adversary, and the corruption level (i.e., the sum of corruption magnitudes over the horizon) is C ≥ 0. The best-known algorithms in this setting are limited in that they either are computationally inefficient or require a strong assumption on the corruption, or their regret is at least C times worse than the regret without corruption. In this paper, to… 

Tables from this paper

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

A novel technique to control the sum of weighted uncertainty is developed and this algorithm achieves regret bounds either nearly match the performance lower bound or improve the existing methods for all the corruption levels and in both known and unknown ζ cases.

Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs

This paper proposes the first computationally efficient horizon-free algorithm for linear mixture MDPs, which achieves the optimal (cid:101) O ( d √ K + d 2 ) regret up to logarithmic factors.

Learning Stochastic Shortest Path with Linear Function Approximation

A novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which provably achieves an near-optimal regret guarantee and proves a lower bound of Ω( dB (cid:63) √ K ) .

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

This work proposes the first computationally computationally efficient algorithm that achieves the nearly minimax optimal regret for episodic time-inhomogeneous linear Markov decision processes (linear MDPs).

Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms

Surprisingly, in this paper, it is shown that the stochastic contextual problem can be solved as if it is a linear bandit problem, and a novel reduction framework is established that converts every stoChastic contextuallinear bandit instance to a linearBandit instance, when the context distribution is known.



Linear Contextual Bandits with Adversarial Corruptions

A gap-dependent regret bound is proved for the proposed algorithm, which is instance-dependent and thus leads to better performance on good practical instances and is the first variance-aware corruption-robust algorithm for contextual bandits.

Stochastic Linear Bandits Robust to Adversarial Attacks

In a contextual setting, a setup of diverse contexts is revisited, and it is shown that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.

Stochastic bandits robust to adversarial corruptions

We introduce a new model of stochastic bandits with adversarial corruptions which aims to capture settings where most of the input follows a stochastic pattern but some fraction of it can be

One Practical Algorithm for Both Stochastic and Adversarial Bandits

The algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm, and retains "logarithmic" regret guarantee in the stochastic regime even when some observations are contaminated by an adversary.

An Optimal Algorithm for Stochastic and Adversarial Bandits

The proposed algorithm enjoys improved regret guarantees in two intermediate regimes: the moderately contaminated stochastic regime defined by Seldin and Slivkins (2014) and the stochastically constrained adversary studied by Wei and Luo (2018).

Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks

This work provides the first robust bandit algorithm for stochastic linear contextual bandit setting under a fully adaptive and omniscient attack with sub-linear regret and shows by experiments that the proposed algorithm improves the robustness against various kinds of popular attacks.

A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

A smoothed analysis is given, showing that even when contexts may be chosen by an adversary, small perturbations of the adversary's choices suffice for the algorithm to achieve "no regret", perhaps (depending on the specifics of the setting) with a constant amount of initial training data.

Misspecified Linear Bandits

A novel bandit algorithm is developed, comprising a hypothesis test for linearity followed by a decision to use either the OFUL or Upper Confidence Bound (UCB) algorithm, which provably exhibits OFULs favorable regret performance, while for misspecified models satisfying the non-sparse deviation property, the algorithm avoids the linear regret phenomenon and falls back on UCBs sublinear regret scaling.

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

This work develops linear bandit algorithms that automatically adapt to different environments and additionally enjoys minimax-optimal regret in completely adversarial environments, which is the first of this kind to the authors' knowledge.

Better Algorithms for Stochastic Bandits with Adversarial Corruptions

A new algorithm is presented whose regret is nearly optimal, substantially improving upon previous work and can tolerate a significant amount of corruption with virtually no degradation in performance.