# Multi-User Reinforcement Learning with Low Rank Rewards

@article{Agarwal2022MultiUserRL, title={Multi-User Reinforcement Learning with Low Rank Rewards}, author={Naman Agarwal and Prateek Jain and S. Kowshik and Dheeraj M. Nagaraj and Praneeth Netrapalli}, journal={ArXiv}, year={2022}, volume={abs/2210.05355} }

In this work, we consider the problem of collaborative multi-user reinforcement learning. In this setting there are multiple users with the same state-action space and transition probabilities but with different rewards. Under the assumption that the reward matrix of the N users has a low-rank structure – a standard and practically successful assumption in the offline collaborative filtering setting – the question is can we design algorithms with significantly lower sample complexity compared…

## References

SHOWING 1-10 OF 35 REFERENCES

### Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure

- Computer ScienceArXiv
- 2022

A class of MDPs that exhibit low rank structure, where the latent features are unknown are considered, and it is shown that if one must use the low-rank structure of the MDP to estimate part of the Q function, one must incur a sample complexity exponential in the horizon H to learn an near optimal policy.

### Sample Complexity of Multi-task Reinforcement Learning

- Computer ScienceUAI
- 2013

This paper introduces a new multi-task algorithm for a sequence of reinforcement-learning tasks when each task is sampled independently from (an unknown) distribution over a finite set of Markov decision processes whose parameters are initially unknown.

### When Collaborative Filtering Meets Reinforcement Learning

- Computer ScienceArXiv
- 2019

This paper model the recommender-user interactive recommendation problem as an agent-environment RL task, which is mathematically described by a Markov decision process (MDP), and proposes a novel CF-based MDP to achieve collaborative recommendations for the entire user community.

### Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

- Computer Science, MathematicsMachine Learning
- 2013

We consider the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity…

### Multi-task Deep Reinforcement Learning with PopArt

- Computer ScienceAAAI
- 2019

This work proposes to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics, and learns a single trained policy that exceeds median human performance on this multi-task domain.

### Near-optimal Representation Learning for Linear Bandits and Linear RL

- Computer ScienceICML
- 2021

A sample-efficient algorithm is proposed, MTLROFUL, which leverages the shared representation of M linear bandits to achieve regret, which significantly improves upon the baseline Õ(Md √ T ) achieved by solving each task independently.

### Sharing Knowledge in Multi-Task Deep Reinforcement Learning

- Computer ScienceICLR
- 2020

This work studies the benefit of sharing representations among tasks to enable the effective use of deep neural networks in Multi-Task Reinforcement Learning, and extends the well-known finite-time bounds of Approximate Value-Iteration to the multi-task setting.

### Reward-Free Exploration for Reinforcement Learning

- Computer ScienceICML
- 2020

An efficient algorithm is given that conducts episodes of exploration and returns near-suboptimal policies for an arbitrary number of reward functions, and a nearly-matching $\Omega(S^2AH^2/\epsilon^2)$ lower bound is given, demonstrating the near-optimality of the algorithm in this setting.

### Markov Decision Processes with Continuous Side Information

- Computer ScienceALT
- 2018

This work considers a reinforcement learning (RL) setting in which the agent interacts with a sequence of episodic MDPs and proposes algorithms for learning in such Contextual Markov Decision Processes (CMDPs) under an assumption that the unobserved MDP parameters vary smoothly with the observed context.

### Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

- Computer Science, MathematicsNeurIPS
- 2020

A simple, iterative learning algorithm that finds the optimal Q-function with sample complexity of $\widetilde{O}(\frac{1}{\epsilon^{\max(d_1, d_2)+2}})$ when the optimal $Q$-function has low rank and the discounting factor $\gamma$ is below a certain threshold provides an exponential improvement in sample complexity.