• Corpus ID: 233740346

Impact of Representation Learning in Linear Bandits

@inproceedings{Yang2021ImpactOR,
  title={Impact of Representation Learning in Linear Bandits},
  author={Jiaqi Yang and Wei Hu and Jason D. Lee and Simon Shaolei Du},
  booktitle={ICLR},
  year={2021}
}
We study how representation learning can improve the efficiency of bandit problems. We study the setting where we play $T$ linear bandits with dimension $d$ concurrently, and these $T$ bandit tasks share a common $k (\ll d)$ dimensional linear representation. For the finite-action setting, we present a new algorithm which achieves $\widetilde{O}(T\sqrt{kN} + \sqrt{dkNT})$ regret, where $N$ is the number of rounds we play for each bandit. When $T$ is sufficiently large, our algorithm… 
Multi-task Representation Learning with Stochastic Linear Bandits
TLDR
This work proposes an efficient greedy policy that implicitly learns a low dimensional representation by encouraging the matrix formed by the task regression vectors to be of low rank, and derives an upper bound on the multi-task regret of this policy.
Nearly Minimax Algorithms for Linear Bandits with Shared Representation
TLDR
Novel algorithms for multi-task and lifelong linear bandits with shared representation are given, which matches the known minimax regret lower bound up to logarithmic factors and closes the gap in existing results.
Non-Stationary Representation Learning in Sequential Multi-Armed Bandits
TLDR
An online algorithm is introduced that is able to detect task switches and learn and transfer a non-stationary representation in an adaptive fashion and derives a regret upper bound for this algorithm, which significantly outperforms the existing ones that do not learn the representation.
On the Power of Multitask Representation Learning in Linear MDP
TLDR
A Least-Activated-Feature-Abundance (LAFA) criterion is discovered, denoted as κ, with which it is proved that a straightforward least-square algorithm learns a policy which is sub-optimal, which theoretically explains the power of multitask representation learning in reducing sample complexity.
Non-Stationary Representation Learning in Sequential Linear Bandits
TLDR
This paper proposes an online algorithm that facilitates efficient decision-making by learning and transferring non-stationary representations in an adaptive fashion and proves that it outperforms the existing ones that treat tasks independently.
Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms
TLDR
This work shows it can achieve an Õ(min(S,A) ·α/ ) upper-bound, by employing efficient robust mean estimators for both uni-variate and high-dimensional random variables, and shows that this can be improved depending on the distributions of contexts.
Adaptive Clustering and Personalization in Multi-Agent Stochastic Linear Bandits
TLDR
This paper proposes a successive refinement algorithm, which for any agent, achieves regret scaling as O( √ T/N), and introduces a natural algorithm where, the personal bandit instances are initialized with the estimates of the global average model and show that, any agent i whose parameter deviates from the population average by i, attains a regret scaling of Õ.
Towards Sample-efficient Overparameterized Meta-learning
TLDR
This work shows that surprisingly, overparameterization arises as a natural answer to these fundamental meta-learning questions, and develops a theory to explain how feature covariance can implicitly help reduce the sample complexity well below the degrees of freedom and lead to small estimation error.
Towards Sample-efficient Overparameterized Meta-learning
TLDR
This work shows that surprisingly, overparameterization arises as a natural answer to these fundamental meta-learning questions, and develops a theory to explain how feature covariance can implicitly help reduce the sample complexity well below the degrees of freedom and lead to small estimation error.
Towards Sample-efficient Overparameterized Meta-learning
TLDR
This work shows that surprisingly, overparameterization arises as a natural answer to these fundamental meta-learning questions, and develops a theory to explain how feature covariance can implicitly help reduce the sample complexity well below the degrees of freedom and lead to small estimation error.
...
1
2
...