• Corpus ID: 211043624

Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning

@article{Zhao2020MutualIS,
  title={Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning},
  author={Rui Zhao and Volker Tresp and Wei Xu},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.01963}
}
In reinforcement learning, an agent learns to reach a set of goals by means of an external reward signal. In the natural world, intelligent organisms learn from internal drives, bypassing the need for external signals, which is beneficial for a wide range of tasks. Motivated by this observation, we propose to formulate an intrinsic objective as the mutual information between the goal states and the controllable states. This objective encourages the agent to take control of its environment… 

Figures from this paper

Unsupervised Reinforcement Learning for Transferable Manipulation Skill Discovery

This work proposes an unsupervised method for transferable manipulation skill discovery that enables the agent to learn interaction behavior, the key aspect of the robotic manipulation learning, without access to the environment reward, and to generalize to arbitrary downstream manipulation tasks with the learned task-agnostic skills.

R ETHINKING G OAL -C ONDITIONED S UPERVISED L EARNING AND I TS C ONNECTION TO O FFLINE RL

Experiments demonstrate that WGCSL outperforms current offline goal-conditioned approaches by a great margin in terms of learning efficiency and convergent performance.

Rethinking Goal-conditioned Supervised Learning and Its Connection to Offline RL

Weighted GCSL is proved to optimize an equivalent lower bound of the goal-conditioned RL objective and generates monotonically improved policies via an iterated scheme, which holds for any behavior policies, and therefore WGCSL can be applied to both online and offline settings.

References

SHOWING 1-10 OF 54 REFERENCES

Unsupervised Control Through Non-Parametric Discriminative Rewards

An unsupervised learning algorithm to train agents to achieve perceptually-specified goals using only a stream of observations and actions, which leads to a co-operative game and a learned reward function that reflects similarity in controllable aspects of the environment instead of distance in the space of observations.

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

A novel multi-goal RL objective based on weighted entropy is proposed, which encourages the agent to maximize the expected return, as well as to achieve more diverse goals and a maximum entropy-based prioritization framework is developed to optimize the proposed objective.

Variational Intrinsic Control

This paper instantiates two policy gradient based algorithms, one that creates an explicit embedding space of options and one that represents options implicitly, that provide an explicit measure of empowerment in a given state that can be used by an empowerment maximizing agent.

Continuous control with deep reinforcement learning

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

A suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware and following a Multi-Goal Reinforcement Learning (RL) framework are introduced.

Curiosity-Driven Experience Prioritization via Density Estimation

A novel Curiosity-Driven Prioritization (CDP) framework to encourage the agent to over-sample those trajectories that have rare achieved goal states and the experimental results show that CDP improves both performance and sample-efficiency of reinforcement learning agents, compared to state-of-the-art methods.

Human-level control through deep reinforcement learning

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Energy-Based Hindsight Experience Prioritization

An energy-based framework for prioritizing hindsight experience in robotic manipulation tasks, inspired by the work-energy principle in physics, that hypothesizes that replaying episodes that have high trajectory energy is more effective for reinforcement learning in robotics.

Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery

This paper studies how to automatically learn dynamical distances: a measure of the expected number of time steps to reach a given goal state from any other state, which can be used to provide well-shaped reward functions for reaching new goals, making it possible to learn complex tasks efficiently.

Dynamical Distance Learning for Unsupervised and Semi-Supervised Skill Discovery

This paper studies how to automatically learn dynamical distances: a measure of the expected number of time steps to reach a given goal state from any other state, which can be used to provide well-shaped reward functions for reaching new goals, making it possible to learn complex tasks efficiently.
...