Nearly Minimax Algorithms for Linear Bandits with Shared Representation

@article{Yang2022NearlyMA,
  title={Nearly Minimax Algorithms for Linear Bandits with Shared Representation},
  author={Jiaqi Yang and Qi Lei and Jason D. Lee and Simon Shaolei Du},
  journal={ArXiv},
  year={2022},
  volume={abs/2203.15664}
}
We give novel algorithms for multi-task and lifelong linear bandits with shared representation. Specifically, we consider the setting where we play M linear bandits with dimension d , each for T rounds, and these M bandit tasks share a common k ( (cid:28) d ) dimensional linear representation. For both the multi-task setting where we play the tasks concurrently, and the lifelong setting where we play tasks sequentially, we come up with novel algorithms that achieve (cid:101) O (cid:16) d √ kMT… 

Figures from this paper

Provable Benefits of Representational Transfer in Reinforcement Learning
TLDR
A new notion of task relatedness between source and target tasks is proposed, and a novel approach for representational transfer under this assumption is developed, showing that given a generative access to source tasks, one can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy.
Meta-Learning Representations with Contextual Linear Bandits
  • 2022

References

SHOWING 1-10 OF 61 REFERENCES
Near-optimal Representation Learning for Linear Bandits and Linear RL
TLDR
A sample-efficient algorithm is proposed, MTLROFUL, which leverages the shared representation of M linear bandits to achieve regret, which significantly improves upon the baseline Õ(Md √ T ) achieved by solving each task independently.
Regret Bounds for Lifelong Learning
TLDR
A lifelong learning strategy is proposed which refines the underlying data representation used by the within-task algorithm, thereby transferring information from one task to the next, and bounds are in expectation for a general loss function, and uniform for a convex loss.
Bandit Phase Retrieval
TLDR
The analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling Russo and Roy [2018] are not sufficient for optimal regret.
Non-Stationary Representation Learning in Sequential Linear Bandits
TLDR
An online algorithm is proposed that facilitates efficient decision-making by learning and transferring non-stationary representations in an adaptive fashion and it is proved that the algorithm significantly outperforms the existing ones that treat tasks independently.
Multi-Task Learning for Contextual Bandits
TLDR
An upper confidence bound-based multi-task learning algorithm for contextual bandits is proposed, a corresponding regret bound is established, and this bound is interpreted to quantify the advantages of learning in the presence of high task (arm) similarity.
Provable Lifelong Learning of Representations
TLDR
This work proposes a lifelong learning algorithm that maintains and refines the internal feature representation and proves that for any desired accuracy on all tasks, the dimension of the representation remains close to that of the underlying representation.
Impact of Representation Learning in Linear Bandits
TLDR
A new algorithm is presented which achieves a corresponding regret bound which demonstrates the benefit of representation learning in certain regimes, and an $\Omega(T\sqrt{kN} + \sqrt {dkNT})$ regret lower bound is provided, showing that the algorithm is minimax-optimal up to poly-logarithmic factors.
Improved Algorithms for Linear Stochastic Bandits
TLDR
A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization
TLDR
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit Problems and two-layer neural network with polynomial activation bandit problem, providing a minimax-optimal algorithm in the dimension.
Multi-task Linear Bandits
TLDR
It is shown that sequential transfer may indeed hav e positive impact also in the multiarmed bandit problem with a significant reduction in the regr et, and this approach, known as multi-task or transfer learning, has made significant gains in supervis d learning scenarios.
...
...