# Impact of Representation Learning in Linear Bandits

@inproceedings{Yang2021ImpactOR, title={Impact of Representation Learning in Linear Bandits}, author={Jiaqi Yang and Wei Hu and Jason D. Lee and Simon Shaolei Du}, booktitle={ICLR}, year={2021} }

We study how representation learning can improve the efficiency of bandit problems. We study the setting where we play $T$ linear bandits with dimension $d$ concurrently, and these $T$ bandit tasks share a common $k (\ll d)$ dimensional linear representation. For the finite-action setting, we present a new algorithm which achieves $\widetilde{O}(T\sqrt{kN} + \sqrt{dkNT})$ regret, where $N$ is the number of rounds we play for each bandit. When $T$ is sufficiently large, our algorithm…

## 17 Citations

Multi-task Representation Learning with Stochastic Linear Bandits

- Computer ScienceArXiv
- 2022

This work proposes an efﬁcient greedy policy that implicitly learns a low dimensional representation by encouraging the matrix formed by the task regression vectors to be of low rank, and derives an upper bound on the multi-task regret of this policy.

Nearly Minimax Algorithms for Linear Bandits with Shared Representation

- Computer ScienceArXiv
- 2022

Novel algorithms for multi-task and lifelong linear bandits with shared representation are given, which matches the known minimax regret lower bound up to logarithmic factors and closes the gap in existing results.

Non-Stationary Representation Learning in Sequential Multi-Armed Bandits

- Computer Science
- 2021

An online algorithm is introduced that is able to detect task switches and learn and transfer a non-stationary representation in an adaptive fashion and derives a regret upper bound for this algorithm, which significantly outperforms the existing ones that do not learn the representation.

On the Power of Multitask Representation Learning in Linear MDP

- Computer ScienceArXiv
- 2021

A Least-Activated-Feature-Abundance (LAFA) criterion is discovered, denoted as κ, with which it is proved that a straightforward least-square algorithm learns a policy which is sub-optimal, which theoretically explains the power of multitask representation learning in reducing sample complexity.

Non-Stationary Representation Learning in Sequential Linear Bandits

- Computer Science
- 2022

This paper proposes an online algorithm that facilitates efﬁcient decision-making by learning and transferring non-stationary representations in an adaptive fashion and proves that it outperforms the existing ones that treat tasks independently.

Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms

- Computer ScienceArXiv
- 2022

This work shows it can achieve an Õ(min(S,A) ·α/ ) upper-bound, by employing efficient robust mean estimators for both uni-variate and high-dimensional random variables, and shows that this can be improved depending on the distributions of contexts.

Adaptive Clustering and Personalization in Multi-Agent Stochastic Linear Bandits

- Computer Science
- 2021

This paper proposes a successive refinement algorithm, which for any agent, achieves regret scaling as O( √ T/N), and introduces a natural algorithm where, the personal bandit instances are initialized with the estimates of the global average model and show that, any agent i whose parameter deviates from the population average by i, attains a regret scaling of Õ.

Towards Sample-efficient Overparameterized Meta-learning

- Computer Science
- 2021

This work shows that surprisingly, overparameterization arises as a natural answer to these fundamental meta-learning questions, and develops a theory to explain how feature covariance can implicitly help reduce the sample complexity well below the degrees of freedom and lead to small estimation error.

Towards Sample-efficient Overparameterized Meta-learning

- Computer Science
- 2022

This work shows that surprisingly, overparameterization arises as a natural answer to these fundamental meta-learning questions, and develops a theory to explain how feature covariance can implicitly help reduce the sample complexity well below the degrees of freedom and lead to small estimation error.

Towards Sample-efficient Overparameterized Meta-learning

- Computer ScienceNeurIPS
- 2021

This work shows that surprisingly, overparameterization arises as a natural answer to these fundamental meta-learning questions, and develops a theory to explain how feature covariance can implicitly help reduce the sample complexity well below the degrees of freedom and lead to small estimation error.