Operator Augmentation for Model-based Policy Evaluation
@article{Tang2021OperatorAF, title={Operator Augmentation for Model-based Policy Evaluation}, author={Xun Tang and Lexing Ying and Yuhua Zhu}, journal={ArXiv}, year={2021}, volume={abs/2110.12658} }
In model-based reinforcement learning, the transition matrix and reward vector are often estimated from random samples subject to noise. Even if the estimated model is an unbiased estimate of the true underlying model, the value function computed from the estimated model is biased. We introduce an operator augmentation method for reducing the error introduced by the estimated model. When the error is in the residual norm, we prove that the augmentation factor is always positive and upper…
References
SHOWING 1-10 OF 37 REFERENCES
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
- Computer ScienceICML
- 2011
PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way by learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning.
Guided Policy Search
- Computer ScienceICML
- 2013
This work presents a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima, and shows how differential dynamic programming can be used to generate suitable guiding samples, and describes a regularized importance sampled policy optimization that incorporates these samples into the policy search.
Operator Augmentation for General Noisy Matrix Systems
- Computer Science, MathematicsArXiv
- 2021
The elliptic operator augmentation framework is extended to the general nonsymmetric matrix case and it is shown that under the conditions of right-hand-side isotropy and noise symmetry that the optimaloperator augmentation factor for the residual error is always positive, thereby making the framework amenable to a necessary bootstrapping step.
Reinforcement Learning: An Introduction
- Computer ScienceIEEE Transactions on Neural Networks
- 2005
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Operator Augmentation for Noisy Elliptic Systems
- Mathematics, Computer ScienceArXiv
- 2020
The operator augmentation framework is proposed, a collection of easy-to-implement algorithms that augment a noisy inverse operator by subtracting an additional auxiliary term that reduces error in an elliptic linear system with the operator corrupted by noise.
Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
- Computer Science, MathematicsNIPS
- 2015
Embed to Control is introduced, a method for model learning and control of non-linear dynamical systems from raw pixel images that is derived directly from an optimal control formulation in latent space and exhibits strong performance on a variety of complex control problems.
Action-Conditional Video Prediction using Deep Networks in Atari Games
- Computer ScienceNIPS
- 2015
This paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs and proposes and evaluates two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks.
Mastering the game of Go with deep neural networks and tree search
- Computer ScienceNature
- 2016
Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
An introduction to sampling via measure transport
- Computer Science
- 2016
The fundamentals of a measure transport approach to sampling are presented, describing connections with other optimization--based samplers, with inference and density estimation schemes using optimal transport, and with alternative transformation--based approaches to simulation.
Dyna, an integrated architecture for learning, planning, and reacting
- Computer ScienceSGAR
- 1991
Dyna is an AI architecture that integrates learning, planning, and reactive execution that relies on machine learning methods for learning from examples, yet is not tied to any particular method.