# Nearly Minimax Algorithms for Linear Bandits with Shared Representation

@article{Yang2022NearlyMA,
title={Nearly Minimax Algorithms for Linear Bandits with Shared Representation},
author={Jiaqi Yang and Qi Lei and Jason D. Lee and Simon Shaolei Du},
journal={ArXiv},
year={2022},
volume={abs/2203.15664}
}
• Published 29 March 2022
• Computer Science
• ArXiv
We give novel algorithms for multi-task and lifelong linear bandits with shared representation. Speciﬁcally, we consider the setting where we play M linear bandits with dimension d , each for T rounds, and these M bandit tasks share a common k ( (cid:28) d ) dimensional linear representation. For both the multi-task setting where we play the tasks concurrently, and the lifelong setting where we play tasks sequentially, we come up with novel algorithms that achieve (cid:101) O (cid:16) d √ kMT…
2 Citations

## Figures from this paper

Provable Benefits of Representational Transfer in Reinforcement Learning
• Computer Science
ArXiv
• 2022
A new notion of task relatedness between source and target tasks is proposed, and a novel approach for representational transfer under this assumption is developed, showing that given a generative access to source tasks, one can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy.

## References

SHOWING 1-10 OF 61 REFERENCES
Near-optimal Representation Learning for Linear Bandits and Linear RL
• Computer Science
ICML
• 2021
A sample-efficient algorithm is proposed, MTLROFUL, which leverages the shared representation of M linear bandits to achieve regret, which significantly improves upon the baseline Õ(Md √ T ) achieved by solving each task independently.
Regret Bounds for Lifelong Learning
• Computer Science
AISTATS
• 2017
A lifelong learning strategy is proposed which refines the underlying data representation used by the within-task algorithm, thereby transferring information from one task to the next, and bounds are in expectation for a general loss function, and uniform for a convex loss.
Bandit Phase Retrieval
• Computer Science
NeurIPS
• 2021
The analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling Russo and Roy [2018] are not sufficient for optimal regret.
Non-Stationary Representation Learning in Sequential Linear Bandits
• Computer Science
IEEE Open Journal of Control Systems
• 2022
An online algorithm is proposed that facilitates efficient decision-making by learning and transferring non-stationary representations in an adaptive fashion and it is proved that the algorithm significantly outperforms the existing ones that treat tasks independently.
• Computer Science
NIPS
• 2017
An upper confidence bound-based multi-task learning algorithm for contextual bandits is proposed, a corresponding regret bound is established, and this bound is interpreted to quantify the advantages of learning in the presence of high task (arm) similarity.
Provable Lifelong Learning of Representations
• Computer Science
AISTATS
• 2022
This work proposes a lifelong learning algorithm that maintains and reﬁnes the internal feature representation and proves that for any desired accuracy on all tasks, the dimension of the representation remains close to that of the underlying representation.
Impact of Representation Learning in Linear Bandits
• Computer Science
ICLR
• 2021
A new algorithm is presented which achieves a corresponding regret bound which demonstrates the benefit of representation learning in certain regimes, and an $\Omega(T\sqrt{kN} + \sqrt {dkNT})$ regret lower bound is provided, showing that the algorithm is minimax-optimal up to poly-logarithmic factors.
Improved Algorithms for Linear Stochastic Bandits
• Computer Science
NIPS
• 2011
A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization
• Computer Science
NeurIPS
• 2021
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit Problems and two-layer neural network with polynomial activation bandit problem, providing a minimax-optimal algorithm in the dimension.