Seeking entropy: complex behavior from intrinsic motivation to occupy action-state path space

  title={Seeking entropy: complex behavior from intrinsic motivation to occupy action-state path space},
  author={Jorge Ram'irez-Ruiz and Dmytro Grytskyy and Rub{\'e}n Moreno-Bote},
Intrinsic motivation generates behaviors that do not necessarily lead to immediate reward, but help exploration and learning. Here we show that agents having the sole goal of maximizing occupancy of future actions and states, that is, moving and exploring on the long term, are capable of complex behavior without any reference to external rewards. We find that action-state path entropy is the only measure consistent with additivity and other intuitive properties of expected future action-state… 

Figures from this paper


A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment
This paper investigates the use of empowerment in the presence of an extrinsic reward signal and proposes a unified Bellman optimality principle for empowered reward maximization, which generalizes both Bellman's Optimality principle as well as recent information-theoretical extensions to it.
Efficient computation of optimal actions
  • E. Todorov
  • Computer Science
    Proceedings of the National Academy of Sciences
  • 2009
This work proposes a more structured formulation that greatly simplifies the construction of optimal control laws in both discrete and continuous domains, and enables computations that were not possible before.
Inverse Reward Design
This work introduces inverse reward design (IRD) as the problem of inferring the true objective based on the designed reward and the training MDP, and introduces approximate methods for solving IRD problems, and uses their solution to plan risk-averse behavior in test MDPs.
Modeling purposeful adaptive behavior with the principle of maximum causal entropy
The principle of maximum causal entropy is introduced, a general technique for applying information theory to decision-theoretic, game-the theoretical, and control settings where relevant information is sequentially revealed over time.
An information-theoretic approach to curiosity-driven reinforcement learning
It is shown that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy.
Empowerment for continuous agent—environment systems
The goal of this article is to extend empowerment to the significantly more important and relevant case of continuous vector-valued state spaces and initially unknown state transition probabilities.
Trading Value and Information in MDPs
The tradeoff between value and information, explored using the info-rl algorithm, provides a principled justification for stochastic (soft) policies and is used to show that these optimal policies are also robust to uncertainties in settings with only partial knowledge of the MDP parameters.
Where Do Rewards Come From
A general computa- tional framework for reward is advanced that places it in an evolutionary context, formulating a notion of an optimal reward function given a fitness function and some distribution of environments.
Bridging the Gap Between Value and Policy Based Reinforcement Learning
A new RL algorithm, Path Consistency Learning (PCL), is developed that minimizes a notion of soft consistency error along multi-step action sequences extracted from both on- and off-policy traces and significantly outperforms strong actor-critic and Q-learning baselines across several benchmarks.
Linearly-solvable Markov decision problems
A class of MPDs which greatly simplify Reinforcement Learning, which have discrete state spaces and continuous control spaces and enable efficient approximations to traditional MDPs.