Corpus ID: 212414989

Stochastic Linear Contextual Bandits with Diverse Contexts

  title={Stochastic Linear Contextual Bandits with Diverse Contexts},
  author={Weiqiang Wu and Jiahai Yang and Cong Shen},
In this paper, we investigate the impact of context diversity on stochastic linear contextual bandits. As opposed to the previous view that contexts lead to more difficult bandit learning, we show that when the contexts are sufficiently diverse, the learner is able to utilize the information obtained during exploitation to shorten the exploration process, thus achieving reduced regret. We design the LinUCB-d algorithm, and propose a novel approach to analyze its regret performance. The main… Expand

Figures and Topics from this paper

Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks
This work provides the first robust bandit algorithm for stochastic linear contextual bandit setting under a fully adaptive and omniscient attack and significantly improves the robustness against various kinds of popular attacks. Expand
An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling
The proposed SGD-TS algorithm, which uses a single-step SGD update to exploit past information and uses Thompson Sampling for exploration, achieves regret with the total time complexity that scales linearly in T and d, where T is the total number of rounds and d is the number of features. Expand
Leveraging Good Representations in Linear Contextual Bandits
This paper provides a systematic analysis of the different definitions of “good” representations proposed in the literature and proposes a novel selection algorithm able to adapt to the best representation in a set of M candidates that achieves constant regret whenever a “ good” representation is available in the set. Expand


Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K actions in response to the observed context, and observes the reward only for thatExpand
Contextual Bandits with Linear Payoff Functions
An O (√ Td ln (KT ln(T )/δ) ) regret bound is proved that holds with probability 1− δ for the simplest known upper confidence bound algorithm for this problem. Expand
Efficient Optimal Learning for Contextual Bandits
This work provides the first efficient algorithm with an optimal regret and uses a cost sensitive classification learner as an oracle and has a running time polylog(N), where N is the number of classification rules among which the oracle might choose. Expand
Finite-Time Analysis of Kernelised Contextual Bandits
This work proposes KernelUCB, a kernelised UCB algorithm, and gives a cumulative regret bound through a frequentist analysis and improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function. Expand
Mostly Exploration-Free Algorithms for Contextual Bandits
Surprisingly, it is found that a simple greedy algorithm can be rate optimal if there is sufficient randomness in the observed contexts (covariates) and it is proved that this is always the case for a two-armed bandit under a general class of context distributions that satisfy a condition the authors term covariate diversity. Expand
A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem
A smoothed analysis is given, showing that even when contexts may be chosen by an adversary, small perturbations of the adversary's choices suffice for the algorithm to achieve "no regret", perhaps (depending on the specifics of the setting) with a constant amount of initial training data. Expand
Tighter Bounds for Multi-Armed Bandits with Expert Advice
A new algorithm, similar in spirit to EXP4, which has a bound ofO( √ TS logM), the S parameter measures the extent to which expert recommendations agree; the key to this algorithm is a linear-programing-based exploration strategy that is optimal in a certain sense. Expand
A contextual-bandit approach to personalized news article recommendation
This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks. Expand
Improved Algorithms for Linear Stochastic Bandits
A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement. Expand
Finite-time Analysis of the Multiarmed Bandit Problem
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support. Expand