• Corpus ID: 239768422

Operator Augmentation for Model-based Policy Evaluation

@article{Tang2021OperatorAF,
  title={Operator Augmentation for Model-based Policy Evaluation},
  author={Xun Tang and Lexing Ying and Yuhua Zhu},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.12658}
}
In model-based reinforcement learning, the transition matrix and reward vector are often estimated from random samples subject to noise. Even if the estimated model is an unbiased estimate of the true underlying model, the value function computed from the estimated model is biased. We introduce an operator augmentation method for reducing the error introduced by the estimated model. When the error is in the residual norm, we prove that the augmentation factor is always positive and upper… 

Figures from this paper

References

SHOWING 1-10 OF 37 REFERENCES
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
TLDR
PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way by learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning.
Guided Policy Search
TLDR
This work presents a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima, and shows how differential dynamic programming can be used to generate suitable guiding samples, and describes a regularized importance sampled policy optimization that incorporates these samples into the policy search.
Operator Augmentation for General Noisy Matrix Systems
TLDR
The elliptic operator augmentation framework is extended to the general nonsymmetric matrix case and it is shown that under the conditions of right-hand-side isotropy and noise symmetry that the optimaloperator augmentation factor for the residual error is always positive, thereby making the framework amenable to a necessary bootstrapping step.
Reinforcement Learning: An Introduction
TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Operator Augmentation for Noisy Elliptic Systems
TLDR
The operator augmentation framework is proposed, a collection of easy-to-implement algorithms that augment a noisy inverse operator by subtracting an additional auxiliary term that reduces error in an elliptic linear system with the operator corrupted by noise.
Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
TLDR
Embed to Control is introduced, a method for model learning and control of non-linear dynamical systems from raw pixel images that is derived directly from an optimal control formulation in latent space and exhibits strong performance on a variety of complex control problems.
Action-Conditional Video Prediction using Deep Networks in Atari Games
TLDR
This paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs and proposes and evaluates two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks.
Mastering the game of Go with deep neural networks and tree search
TLDR
Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
An introduction to sampling via measure transport
TLDR
The fundamentals of a measure transport approach to sampling are presented, describing connections with other optimization--based samplers, with inference and density estimation schemes using optimal transport, and with alternative transformation--based approaches to simulation.
Dyna, an integrated architecture for learning, planning, and reacting
TLDR
Dyna is an AI architecture that integrates learning, planning, and reactive execution that relies on machine learning methods for learning from examples, yet is not tied to any particular method.
...
...