• Corpus ID: 53280207

Learning Latent Dynamics for Planning from Pixels

@article{Hafner2019LearningLD,
  title={Learning Latent Dynamics for Planning from Pixels},
  author={Danijar Hafner and Timothy P. Lillicrap and Ian S. Fischer and Ruben Villegas and David R Ha and Honglak Lee and James Davidson},
  journal={ArXiv},
  year={2019},
  volume={abs/1811.04551}
}
Planning has been very successful for control tasks with known environment dynamics. To leverage planning in unknown environments, the agent needs to learn the dynamics from interactions with the world. However, learning dynamics models that are accurate enough for planning has been a long-standing challenge, especially in image-based domains. We propose the Deep Planning Network (PlaNet), a purely model-based agent that learns the environment dynamics from images and chooses actions through… 
Planning from Images with Deep Latent Gaussian Process Dynamics
TLDR
A deep latent Gaussian process dynamics (DLGPD) model that learns low-dimensional system dynamics from environment interactions with visual observations using neural networks and models the system dynamics in the learned latent space with Gaussian processes is proposed.
Planning from Pixels using Inverse Dynamics Models
TLDR
This work proposes a novel way to learn latent world models by learning to predict sequences of future actions conditioned on task completion, which adaptively focus modeling capacity on task-relevant dynamics, while simultaneously serving as an effective heuristic for planning with sparse rewards.
Evolutionary Planning in Latent Space
TLDR
This paper proposes to learn a world model that enables Evolutionary Planning in Latent Space (EPLS) and uses the Random Mutation Hill Climbing to find a sequence of actions that maximize expected reward in this learned model of the world.
Heteroscedastic Uncertainty for Robust Generative Latent Dynamics
TLDR
This letter presents a method to jointly learn a latent state representation, and the associated dynamics that is amenable for long-term planning, and closed-loop control under perceptually difficult conditions, and demonstrates that it produces significantly more accurate predictions, and exhibits improved control performance, compared to a model that assumes homoscedastic uncertainty only, in the presence of varying degrees of input degradation.
Learning Latent State Spaces for Planning through Reward Prediction
TLDR
This work introduces a model-based planning framework which learns a latent reward prediction model and then plans in the latent state-space, and finds that this method can successfully learn an accurate latent rewards model in the presence of the irrelevant information while existing model- based methods fail.
Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models
TLDR
An algorithm is developed that learns an action-conditional, predictive model of expected future observations, rewards and values from which a policy can be derived by following the gradient of the estimated value along imagined trajectories.
Sequential Generative Exploration Model for Partially Observable Reinforcement Learning
TLDR
This paper proposes a novel reward shaping approach to infer the intrinsic rewards for the agent from a sequential generative model, and formulate the inference procedure for dynamics prediction as a multi-step forward prediction task, where the time abstraction could effectively help to increase the expressiveness of the intrinsic reward signals.
PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals
TLDR
This work proposes PlanGAN, a model-based algorithm specifically designed for solving multi-goal tasks in environments with sparse rewards, and indicates that it can achieve comparable performance whilst being around 4-8 times more sample efficient.
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning
TLDR
A new model-based RL algorithm, coined trajectory-wise multiple choice learning, that learns a multi-headed dynamics model for dynamics generalization and incorporates context learning, which encodes dynamics-specific information from past experiences into the context latent vector, enabling the model to perform online adaptation to unseen environments.
Offline Reinforcement Learning from Images with Latent Space Models
TLDR
This work proposes to learn a latent-state dynamics model, and represent the uncertainty in the latent space of the model predictions, and significantly outperforms previous offline model-free RL methods as well as state-of-the-art online visual model-based RL methods.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 65 REFERENCES
Deep Variational Reinforcement Learning for POMDPs
TLDR
Deep variational reinforcement learning (DVRL) is proposed, which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information.
Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control
TLDR
It is demonstrated that visual MPC can generalize to never-before-seen objects---both rigid and deformable---and solve a range of user-defined object manipulation tasks using the same model.
Universal Planning Networks
TLDR
This work finds that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images.
Deep visual foresight for planning robot motion
  • Chelsea Finn, S. Levine
  • Computer Science
    2017 IEEE International Conference on Robotics and Automation (ICRA)
  • 2017
TLDR
This work develops a method for combining deep action-conditioned video prediction models with model-predictive control that uses entirely unlabeled training data and enables a real robot to perform nonprehensile manipulation — pushing objects — and can handle novel objects not seen during training.
Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning
TLDR
A robust method to learn multimodal transitions using function approximation, which is a key preliminary for model-based RL in stochastic domains, is shown.
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
TLDR
This paper presents a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observations, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy.
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
TLDR
This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.
Self-Supervised Visual Planning with Temporal Skip Connections
TLDR
This work introduces a video prediction model that can keep track of objects through occlusion by incorporating temporal skip-connections and demonstrates that this model substantially outperforms prior work on video prediction-based control.
Improving PILCO with Bayesian Neural Network Dynamics Models
TLDR
PILCO’s framework is extended to use Bayesian deep dynamics models with approximate variational inference, allowing PILCO to scale linearly with number of trials and observation space dimensionality, and it is shown that moment matching is a crucial simplifying assumption made by the model.
Model-Based Planning with Discrete and Continuous Actions
TLDR
This work shows that it is in fact possible to effectively perform planning via backprop in discrete action spaces, using a simple paramaterization of the actions vectors on the simplex combined with input noise when training the forward model.
...
1
2
3
4
5
...