# Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation

@inproceedings{Corneil2018EfficientMD, title={Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation}, author={Dane S. Corneil and Wulfram Gerstner and Johanni Brea}, booktitle={International Conference on Machine Learning}, year={2018} }

Modern reinforcement learning algorithms reach super-human performance in many board and video games, but they are sample inefficient, i.e. they typically require significantly more playing experience than humans to reach an equal performance level. [] Key Method Prioritized sweeping with small backups, a highly efficient planning method, can then be used to update state-action values. We show how VaST can rapidly learn to maximize reward in tasks like 3D navigation and efficiently adapt to sudden changes in…

## 53 Citations

### Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

- Computer ScienceIEEE transactions on neural networks and learning systems
- 2021

A variational dynamic model based on the conditional variational inference to model the multimodality and stochasticity of the environmental state-action transition is proposed and derived, which allows the agent to learn skills by self-supervised exploration without observing extrinsic rewards.

### Learning discrete state abstractions with deep variational inference

- Computer ScienceArXiv
- 2020

This work proposes an information bottleneck method for learning approximate bisimulations, a type of state abstraction, using a deep neural encoder to map states onto continuous embeddings using an action-conditioned hidden Markov model, which is trained end-to-end with the neural network.

### Learning Markov State Abstractions for Deep Reinforcement Learning

- Computer ScienceNeurIPS
- 2021

This work introduces a novel set of conditions and proves that they are sufficient for learning a Markov abstract state representation, and describes a practical training procedure that combines inverse model estimation and temporal contrastive learning to learn an abstraction that approximately satisfies these conditions.

### Towards Robust Bisimulation Metric Learning

- Computer ScienceNeurIPS
- 2021

This work generalizes value function approximation bounds for on-policy bisimulation metrics to non-optimal policies and approximate environment dynamics, and proposes a set of practical remedies that are not only more robust to sparse reward functions, but also able to solve challenging continuous control tasks with observational distractions, where prior methods fail.

### Exact (Then Approximate) Dynamic Programming for Deep Reinforcement Learning

- Computer Science
- 2020

This work proposes a simple technique to stabilize deep Q learning by decoupling dynamic programming and function approximation, and observes that it consistently outperforms prior approaches such as double DQN and BCQ, which often diverge or fail completely, while generalizing more effectively than directly applying the tabular policy.

### Hierarchies of Planning and Reinforcement Learning for Robot Navigation

- Computer Science2021 IEEE International Conference on Robotics and Automation (ICRA)
- 2021

VI-RL is proposed, a novel hierarchical framework that utilizes a trainable planning policy for the HL representation that results in consistent strong improvement over vanilla RL, and is on par with vanilla hierarchal RL on single layouts but more broadly applicable to multiple layouts.

### DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs

- Computer ScienceICLR
- 2021

This work introduces the Deep Averagers with Costs MDP (DAC-MDP) and investigates its solutions for offline RL, a non-parametric model that can leverage deep representations and account for limited data by introducing costs for exploiting under-represented parts of the model.

### ING BY SOLVING DERIVED NON-PARAMETRIC MDPS

- Computer Science
- 2021

This work introduces the Deep Averagers with Costs MDP (DAC-MDP) and investigates its solutions for offline RL, a non-parametric model that can leverage deep representations and account for limited data by introducing costs for exploiting under-represented parts of the model.

### Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning

- Computer ScienceArXiv
- 2020

This work constructs a simplified Markov Decision Process for which exact Q-values can be computed efficiently as more data comes in and shows that the Q-value for each transition in the simplified MDP is a lower bound of the Q -value for the same Transition in the original continuous Q-learning problem.

### Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning

- PsychologyICLR
- 2022

Value Function Spaces is proposed: a simple approach that produces a representation that compactly abstracts task relevant information and robustly ignores distractors and enables better zero-shot generalization than alternative model-free and model-based methods.

## References

SHOWING 1-10 OF 41 REFERENCES

### Value Prediction Network

- Computer ScienceNIPS
- 2017

This paper proposes a novel deep reinforcement learning architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network, which outperforms Deep Q-Network on several Atari games even with short-lookahead planning.

### TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning

- Computer ScienceICLR 2018
- 2017

TreeQN is proposed, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions and ATreeC, an actor-critic variant that augments TreeQN with a softmax layer to form a stochastic policy network.

### Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

- Computer Science2018 IEEE International Conference on Robotics and Automation (ICRA)
- 2018

It is demonstrated that neural network dynamics models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits that accomplish various complex locomotion tasks.

### Model-Free Episodic Control

- Computer Science, BiologyArXiv
- 2016

This work demonstrates that a simple model of hippocampal episodic control can learn to solve difficult sequential decision-making tasks and attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher overall reward on some of the more challenging domains.

### Imagination-Augmented Agents for Deep Reinforcement Learning

- Computer ScienceNIPS
- 2017

Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects, shows improved data efficiency, performance, and robustness to model misspecification compared to several baselines.

### Reinforcement Learning with Unsupervised Auxiliary Tasks

- Computer ScienceICLR
- 2017

This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.

### Asynchronous Methods for Deep Reinforcement Learning

- Computer ScienceICML
- 2016

A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

### Playing Atari with Deep Reinforcement Learning

- Computer ScienceArXiv
- 2013

This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

### Neural Episodic Control

- Computer ScienceICML
- 2017

This work proposes Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them, and shows across a wide range of environments that the agent learns significantly faster than other state-of-the-art, general purpose deep reinforcementlearning agents.

### Efficient planning in MDPs by small backups

- Computer ScienceICML 2013
- 2013

A new planning backup is introduced that uses only the current value of a single successor state and has a computation time independent of the number of successor states, opening the door to a new class of model-based reinforcement learning methods that exhibit much finer control over their planning process than traditional methods.