Dynamic Planning Networks

@article{Tasfi2021DynamicPN,
  title={Dynamic Planning Networks},
  author={Norman L. Tasfi and Miriam A. M. Capretz},
  journal={2021 International Joint Conference on Neural Networks (IJCNN)},
  year={2021},
  pages={1-9}
}
We introduce Dynamic Planning Networks (DPN), a novel architecture for deep reinforcement learning, that combines model-based and model-free aspects for online planning. Our architecture learns to dynamically construct plans using a learned state-transition model by selecting and traversing between simulated states and actions to maximize information before acting. DPN learns to efficiently form plans by expanding a single action-conditional state transition at a time instead of exhaustively… 

Figures from this paper

NON-LINEAR REWARDS FOR SUCCESSOR FEATURES

  • Computer Science
  • 2020
TLDR
A novel improvement to the successor feature framework is proposed, where it is assumed that the reward function is a non-linear function of the state features, thereby increasing its representational power and making it possible to incorporate the current state into reward.

P OLICY IMPROVEMENT BY PLANNING WITH G UMBEL

TLDR
Gumbel AlphaZero and Gumbel MuZero, respectively without and with model-learning, match the state of the art on Go, chess, and Atari, and significantly improve prior performance when planning with few simulations.

The Differentiable Cross-Entropy Method

TLDR
A differentiable variant that enables CEM to differentiate the output of CEM with respect to the objective function's parameters is introduced and brings CEM inside of the end-to-end learning pipeline where this has otherwise been impossible.

References

SHOWING 1-10 OF 44 REFERENCES

TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning

TLDR
TreeQN is proposed, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions and ATreeC, an actor-critic variant that augments TreeQN with a softmax layer to form a stochastic policy network.

Value Prediction Network

TLDR
This paper proposes a novel deep reinforcement learning architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network, which outperforms Deep Q-Network on several Atari games even with short-lookahead planning.

TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

TLDR
TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions, and ATreeC, an actor-critic variant that augments TreeQN with a softmax layer to form a stochastic policy network.

Imagination-Augmented Agents for Deep Reinforcement Learning

TLDR
Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects, shows improved data efficiency, performance, and robustness to model misspecification compared to several baselines.

Universal Planning Networks

TLDR
This work finds that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images.

Model-Based Planning with Discrete and Continuous Actions

TLDR
This work shows that it is in fact possible to effectively perform planning via backprop in discrete action spaces, using a simple paramaterization of the actions vectors on the simplex combined with input noise when training the forward model.

XLVIN: eXecuted Latent Value Iteration Nets

TLDR
This work proposes eXecuted Latent Value Iteration Networks (XLVINs), which combine recent developments across contrastive self-supervised learning, graph representation learning and neural algorithmic reasoning to alleviate all of the above limitations, successfully deploying VIN-style models on generic environments.

Value Iteration Networks

TLDR
This work introduces the value iteration network (VIN), a fully differentiable neural network with a `planning module' embedded within that shows that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.

Dyna, an integrated architecture for learning, planning, and reacting

TLDR
Dyna is an AI architecture that integrates learning, planning, and reactive execution that relies on machine learning methods for learning from examples, yet is not tied to any particular method.

Learning model-based planning from scratch

TLDR
The "Imagination-based Planner" is introduced, the first model-based, sequential decision-making agent that can learn to construct, evaluate, and execute plans, and also learn elaborate planning strategies in a discrete maze-solving task.