• Corpus ID: 219177064

PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals

@article{Charlesworth2020PlanGANMP,
  title={PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals},
  author={Henry Charlesworth and G. Montana},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.00900}
}
Learning with sparse rewards remains a significant challenge in reinforcement learning (RL), especially when the aim is to train a policy capable of achieving multiple different goals. To date, the most successful approaches for dealing with multi-goal, sparse reward environments have been model-free RL algorithms. In this work we propose PlanGAN, a model-based algorithm specifically designed for solving multi-goal tasks in environments with sparse rewards. Our method builds on the fact that… 

Figures from this paper

Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards

This paper proposes a density-based curriculum learning method for efficient exploration with sparse rewards and better generalization to desired goal distribution that outperforms the state-of-the-art baselines in terms of both data efficiency and success rate.

Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space

Planning to Practice is proposed, a method that makes it practical to train goal-conditioned policies for long-horizon tasks that require multiple distinct types of interactions to solve and can generate feasible sequences of subgoals that enable the policy to efficiently solve the target tasks.

MHER: Model-based Hindsight Experience Replay

Model-based Hindsight Experience Replay is proposed, which exploits experiences more efficiently by leveraging environmental dynamics to generate virtual achieved goals and achieves significantly higher sample efficiency than previous state-of-the-art methods.

Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey

A typology of methods where deep RL algorithms are trained to tackle the developmental robotics problem of the autonomous acquisition of open-ended repertoires of skills is proposed at the intersection of deep RL and developmental approaches.

MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks

FGI (Foresight Goal Inference), a new relabeling strategy that relabels the goals by looking into the future with a learned dynamics model, is developed and the MapGo framework (Model-Assessment Policy optimization for Goal-oriented tasks) is introduced.

Goal-Conditioned Reinforcement Learning: Problems and Solutions

A comprehensive overview of the challenges and algorithms for goal-conditioned reinforcement learning is provided and potential future prospects that recent researches focus on are discussed.

Imaginary Hindsight Experience Replay: Curious Model-based Learning for Sparse Reward Tasks

This work proposes a simple modelbased method tailored for sparse-reward multi-goal tasks that foregoes the need for complicated reward engineering and minimises real-world interactions by incorporating imaginary data into policy updates.

Learning to Shape Rewards using a Game of Two Partners

It is proved that ROSA, which adopts existing RL algorithms, learns to construct a shaping-reward function that beneficial to the task thus ensuring e-cient convergence to high performance policies.

Learning to Shape Rewards using a Game of Switching Controls

It is proved that ROSA, which easily adopts existing RL algorithms, learns to construct a shapingreward function that is tailored to the task thus ensuring efficient convergence to high performance policies.

P LANNING TO P RACTICE : E FFICIENT O NLINE F INE T UNING BY C OMPOSING G OALS IN L ATENT S PACE

This paper proposes Planning to Practice (PTP), a method that makes it practical to train goal-conditioned policies for long-horizon tasks that require multiple distinct types of interactions to solve, and proposes a hybrid offline reinforcement learning approach with online fine-tuning.

References

SHOWING 1-10 OF 46 REFERENCES

Planning with Goal-Conditioned Policies

This work shows that goal-conditioned policies learned with RL can be incorporated into planning, such that a planner can focus on which states to reach, rather than how those states are reached, and proposes using a latent variable model to compactly represent the set of valid states.

Reinforcement learning for robotic manipulation using simulated locomotion demonstrations

This paper introduces a framework whereby an object locomotion policy is initially obtained using a realistic physics simulator, and this policy is then used to generate auxiliary rewards, called simulated locomotion demonstration rewards (SLDRs), which enable us to learn the robot manipulation policy.

Learning Latent Dynamics for Planning from Pixels

The Deep Planning Network (PlaNet) is proposed, a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space using a latent dynamics model with both deterministic and stochastic transition components.

Visual Reinforcement Learning with Imagined Goals

An algorithm is proposed that acquires general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies, efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques.

Competitive Experience Replay

This work proposes a novel method called competitive experience replay, which efficiently supplements a sparse reward by placing learning in the context of an exploration competition between a pair of agents, creating a competitive game designed to drive exploration.

Learning To Reach Goals Without Reinforcement Learning

A theoretical result linking self-supervised imitation learning and reinforcement learning, and empirical results showing that it performs competitively with more complex reinforcement learning methods on a range of challenging goal reaching problems, while yielding advantages in terms of stability and use of offline data.

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

The MuZero algorithm is presented, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.

Model-Based Reinforcement Learning for Atari

Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models, is described and a comparison of several model architectures is presented, including a novel architecture that yields the best results in the authors' setting.

Automatic Goal Generation for Reinforcement Learning Agents

This work uses a generator network to propose tasks for the agent to try to achieve, specified as goal states, and shows that, by using this framework, an agent can efficiently and automatically learn to perform a wide set of tasks without requiring any prior knowledge of its environment.

Energy-Based Hindsight Experience Prioritization

An energy-based framework for prioritizing hindsight experience in robotic manipulation tasks, inspired by the work-energy principle in physics, that hypothesizes that replaying episodes that have high trajectory energy is more effective for reinforcement learning in robotics.