# Multi-User Reinforcement Learning with Low Rank Rewards

@article{Agarwal2022MultiUserRL,
title={Multi-User Reinforcement Learning with Low Rank Rewards},
author={Naman Agarwal and Prateek Jain and S. Kowshik and Dheeraj M. Nagaraj and Praneeth Netrapalli},
journal={ArXiv},
year={2022},
volume={abs/2210.05355}
}
• Published 11 October 2022
• Computer Science
• ArXiv
In this work, we consider the problem of collaborative multi-user reinforcement learning. In this setting there are multiple users with the same state-action space and transition probabilities but with different rewards. Under the assumption that the reward matrix of the N users has a low-rank structure – a standard and practically successful assumption in the offline collaborative filtering setting – the question is can we design algorithms with significantly lower sample complexity compared…

## References

SHOWING 1-10 OF 35 REFERENCES

• Computer Science
ArXiv
• 2022
A class of MDPs that exhibit low rank structure, where the latent features are unknown are considered, and it is shown that if one must use the low-rank structure of the MDP to estimate part of the Q function, one must incur a sample complexity exponential in the horizon H to learn an near optimal policy.
• Computer Science
UAI
• 2013
This paper introduces a new multi-task algorithm for a sequence of reinforcement-learning tasks when each task is sampled independently from (an unknown) distribution over a finite set of Markov decision processes whose parameters are initially unknown.
• Computer Science
ArXiv
• 2019
This paper model the recommender-user interactive recommendation problem as an agent-environment RL task, which is mathematically described by a Markov decision process (MDP), and proposes a novel CF-based MDP to achieve collaborative recommendations for the entire user community.
• Computer Science, Mathematics
Machine Learning
• 2013
We consider the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity
• Computer Science
AAAI
• 2019
This work proposes to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics, and learns a single trained policy that exceeds median human performance on this multi-task domain.
• Computer Science
ICML
• 2021
A sample-efficient algorithm is proposed, MTLROFUL, which leverages the shared representation of M linear bandits to achieve regret, which significantly improves upon the baseline Õ(Md √ T ) achieved by solving each task independently.
• Computer Science
ICLR
• 2020
This work studies the benefit of sharing representations among tasks to enable the effective use of deep neural networks in Multi-Task Reinforcement Learning, and extends the well-known finite-time bounds of Approximate Value-Iteration to the multi-task setting.
• Computer Science
ICML
• 2020
An efficient algorithm is given that conducts episodes of exploration and returns near-suboptimal policies for an arbitrary number of reward functions, and a nearly-matching $\Omega(S^2AH^2/\epsilon^2)$ lower bound is given, demonstrating the near-optimality of the algorithm in this setting.
• Computer Science
ALT
• 2018
This work considers a reinforcement learning (RL) setting in which the agent interacts with a sequence of episodic MDPs and proposes algorithms for learning in such Contextual Markov Decision Processes (CMDPs) under an assumption that the unobserved MDP parameters vary smoothly with the observed context.
• Computer Science, Mathematics
NeurIPS
• 2020
A simple, iterative learning algorithm that finds the optimal Q-function with sample complexity of $\widetilde{O}(\frac{1}{\epsilon^{\max(d_1, d_2)+2}})$ when the optimal $Q$-function has low rank and the discounting factor $\gamma$ is below a certain threshold provides an exponential improvement in sample complexity.