Corpus ID: 59553247

Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning

@article{Lee2019TsallisRL,
  title={Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning},
  author={Kyungjae Lee and Sungyub Kim and Sungbin Lim and Sungjoon Choi and Songhwai Oh},
  journal={ArXiv},
  year={2019},
  volume={abs/1902.00137}
}
In this paper, we present a new class of Markov decision processes (MDPs), called Tsallis MDPs, with Tsallis entropy maximization, which generalizes existing maximum entropy reinforcement learning (RL). A Tsallis MDP provides a unified framework for the original RL problem and RL with various types of entropy, including the well-known standard Shannon-Gibbs (SG) entropy, using an additional real-valued parameter, called an entropic index. By controlling the entropic index, we can generate… Expand
Maximum Entropy RL (Provably) Solves Some Robust RL Problems
TLDR
This paper proves theoretically that standard maximum entropy RL is robust to some disturbances in the dynamics and the reward function, and provides the first rigorous proof and theoretical characterization of the MaxEnt RL robust set. Expand
Hamilton-Jacobi-Bellman Equations for Maximum Entropy Optimal Control
TLDR
The resulting algorithms are the first data-driven control methods that use an information theoretic exploration mechanism in continuous time and are shown to enhance the regularity of the viscosity solution and to be asymptotically consistent as the effect of entropy regularization diminished. Expand
A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning
TLDR
A generic method to devise regularization forms and propose off-policy actor critic algorithms in complex environment settings is provided and a full mathematical analysis of the proposed regularized MDPs are conducted. Expand
Entropic Regularization of Markov Decision Processes
TLDR
A broader family of f-divergences is considered, and more concretely α-diversgences are considered, which inherit the beneficial property of providing the policy improvement step in closed form at the same time yielding a corresponding dual objective for policy evaluation. Expand
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence
TLDR
A generalized policy mirror descent (GPMD) algorithm for solving regularized RL that converges linearly over an entire range of learning rates, in a dimension-free fashion, to the global solution, even when the regularizer lacks strong convexity and smoothness. Expand
Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method
TLDR
This work proposes a novel regularization method that is compatible with a broad range of expressive policy architectures and shows that it outperforms state-of-the-art regularized RL methods in continuous control tasks. Expand
Weighted Entropy Modification for Soft Actor-Critic
TLDR
An algorithm motivated for self-balancing exploration with the introduced weight function is proposed, which leads to state-of-the-art performance on Mujoco tasks despite its simplicity in implementation. Expand
Variational Inference MPC using Tsallis Divergence
TLDR
A novel Tsallis Variational Inference-Model Predictive Control algorithm is derived that allows for effective control of the cost/reward transform and is characterized by superior performance in terms of mean and variance reduction of the associated cost. Expand
The Gradient Convergence Bound of Federated Multi-Agent Reinforcement Learning with Efficient Communication
TLDR
A utility function to consider the balance between reducing communication overheads and improving convergence performance is proposed and two new optimization methods on top of variation-aware periodic averaging methods are developed. Expand
A Functional Mirror Descent Perspective on Reinforcement Learning
  • 2020
Functional mirror descent offers a unifying perspective on optimization of statistical models and provides numerous advantages for the design and analysis of learning algorithms. It brings theExpand
...
1
2
...

References

SHOWING 1-10 OF 31 REFERENCES
Path Consistency Learning in Tsallis Entropy Regularized MDPs
TLDR
A class of novel path consistency learning (PCL) algorithms, called {\em sparse PCL}, for the sparse ERL problem that can work with both on-policy and off-policy data, and is empirically compared with its soft counterpart, and shows its advantage, especially in problems with a large number of actions. Expand
Effective Exploration for Deep Reinforcement Learning via Bootstrapped Q-Ensembles under Tsallis Entropy Regularization
TLDR
A new DRL algorithm is developed that seamless integrates entropy-induced and bootstrap-induced techniques for efficient and deep exploration of the learning environment and is efficient in exploring actions with clear exploration value. Expand
Modeling purposeful adaptive behavior with the principle of maximum causal entropy
TLDR
The principle of maximum causal entropy is introduced, a general technique for applying information theory to decision-theoretic, game-the theoretical, and control settings where relevant information is sequentially revealed over time. Expand
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
TLDR
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods. Expand
Sparse Markov Decision Processes With Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning
TLDR
A sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization with outperforms existing methods in terms of the convergence speed and performance and a sparse value iteration method that solves a sparse MDP and proves the convergence and optimality of sparse value iterations using the Banach fixed-point theorem is proposed. Expand
Continuous Deep Q-Learning with Model-based Acceleration
TLDR
This paper derives a continuous variant of the Q-learning algorithm, which it is called normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods, and substantially improves performance on a set of simulated robotic control tasks. Expand
Bridging the Gap Between Value and Policy Based Reinforcement Learning
TLDR
A new RL algorithm, Path Consistency Learning (PCL), is developed that minimizes a notion of soft consistency error along multi-step action sequences extracted from both on- and off-policy traces and significantly outperforms strong actor-critic and Q-learning baselines across several benchmarks. Expand
Boltzmann Exploration Done Right
TLDR
This paper presents a simple non-monotone schedule that guarantees near-optimal performance, albeit only when given prior access to key problem parameters that are typically not available in practical situations (like the time horizon $T$ and the suboptimality gap $\Delta$). Expand
Reinforcement Learning with Deep Energy-Based Policies
TLDR
A method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before, is proposed and a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution is applied. Expand
Equivalence Between Policy Gradients and Soft Q-Learning
TLDR
There is a precise equivalence between Q-learning and policy gradient methods in the setting of entropy-regularized reinforcement learning, and it is shown that "soft" $Q-learning is exactly equivalent to a policy gradient method. Expand
...
1
2
3
4
...