• Corpus ID: 239768422

Operator Augmentation for Model-based Policy Evaluation

  title={Operator Augmentation for Model-based Policy Evaluation},
  author={Xun Tang and Lexing Ying and Yuhua Zhu},
In model-based reinforcement learning, the transition matrix and reward vector are often estimated from random samples subject to noise. Even if the estimated model is an unbiased estimate of the true underlying model, the value function computed from the estimated model is biased. We introduce an operator augmentation method for reducing the error introduced by the estimated model. When the error is in the residual norm, we prove that the augmentation factor is always positive and upper… 

Figures from this paper


PILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way by learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning.
Guided Policy Search
This work presents a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima, and shows how differential dynamic programming can be used to generate suitable guiding samples, and describes a regularized importance sampled policy optimization that incorporates these samples into the policy search.
Operator Augmentation for General Noisy Matrix Systems
The elliptic operator augmentation framework is extended to the general nonsymmetric matrix case and it is shown that under the conditions of right-hand-side isotropy and noise symmetry that the optimaloperator augmentation factor for the residual error is always positive, thereby making the framework amenable to a necessary bootstrapping step.
Reinforcement Learning: An Introduction
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Operator Augmentation for Noisy Elliptic Systems
The operator augmentation framework is proposed, a collection of easy-to-implement algorithms that augment a noisy inverse operator by subtracting an additional auxiliary term that reduces error in an elliptic linear system with the operator corrupted by noise.
Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
Embed to Control is introduced, a method for model learning and control of non-linear dynamical systems from raw pixel images that is derived directly from an optimal control formulation in latent space and exhibits strong performance on a variety of complex control problems.
Action-Conditional Video Prediction using Deep Networks in Atari Games
This paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs and proposes and evaluates two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks.
Mastering the game of Go with deep neural networks and tree search
Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
An introduction to sampling via measure transport
The fundamentals of a measure transport approach to sampling are presented, describing connections with other optimization--based samplers, with inference and density estimation schemes using optimal transport, and with alternative transformation--based approaches to simulation.
Dyna, an integrated architecture for learning, planning, and reacting
Dyna is an AI architecture that integrates learning, planning, and reactive execution that relies on machine learning methods for learning from examples, yet is not tied to any particular method.