• Corpus ID: 239998785

Local Differential Privacy for Regret Minimization in Reinforcement Learning

  title={Local Differential Privacy for Regret Minimization in Reinforcement Learning},
  author={Evrard Garcelon and Vianney Perchet and Ciara Pike-Burke and Matteo Pirotta},
Reinforcement learning algorithms are widely used in domains where it is desirable to provide a personalized service. In these domains it is common that user data contains sensitive information that needs to be protected from third parties. Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user side. We formulate this notion of privacy for RL by leveraging the local differential privacy (LDP… 

Figures and Tables from this paper


Private Reinforcement Learning with PAC and Regret Guarantees
A private optimism-based learning algorithm is developed that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee, and presents lower bounds on sample complexity and regret for reinforcement learning subject to JDP.
Privacy-Preserving Q-Learning with Functional Noise in Continuous Spaces
This work considers differentially private algorithms for reinforcement learning in continuous spaces, such that neighboring reward functions are indistinguishable, and shows rigorous privacy guarantees by a series of analyses on the kernel of the noise space, the probabilistic bound of such noise samples, and the composition over the iterations.
Multi-Armed Bandits with Local Differential Privacy
This paper proves a lower bound and proposes algorithms whose regret upper bounds match the lower bound up to constant factors and adopts differential privacy.
Differentially Private Contextual Linear Bandits
This paper gives a general scheme converting the classic linear-UCB algorithm into a joint differentially private algorithm using the tree-based algorithm and gives the first lower bound on the additional regret any private algorithms for the MAB problem must incur.
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
An algorithm for finite horizon discrete MDPs and associated analysis that both yields state-of-the art worst-case regret bounds in the dominant terms and yields substantially tighter bounds if the RL environment has small environmental norm, which is a function of the variance of the next-state value functions.
How You Act Tells a Lot: Privacy-Leakage Attack on Deep Reinforcement Learning
This is the first work to investigate privacy leakage in DRL settings and it is shown that DRL-based agents do potentially leak privacy-sensitive information from the trained policies.
Corrupt Bandits for Preserving Local Privacy
A frequentist algoritthm, KLUCB-CF, and a Bayesian algorithm, TS-CF are devised and a lower bound on the expected regret of any bandit algorithm in this corrupted setting is provided.
Locally Private Distributed Reinforcement Learning
This work enables us to obtain a robust agent that performs well across distributed private environments and is the first work that actualizes distributed reinforcement learning under LDP.
Near-optimal Regret Bounds for Reinforcement Learning
This work presents a reinforcement learning algorithm with total regret O(DS√AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D, and proposes a new parameter: An MDP has diameter D if for any pair of states s,s' there is a policy which moves from s to s' in at most D steps.
Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs
This work designs and analyzes the exploration bonus in the more challenging infinite-horizon undiscounted setting and shows that the resulting algorithm (SCCAL+) achieves the same regret bound as UCCRL (Ortner and Ryabko, 2012) while being the first implementable algorithm for this setting.