Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds
@article{Petrik2012ApproximateDP, title={Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds}, author={Marek Petrik}, journal={ArXiv}, year={2012}, volume={abs/1205.1782} }
Approximate dynamic programming is a popular method for solving large Markov decision processes. This paper describes a new class of approximate dynamic programming (ADP) methods- distributionally robust ADP-that address the curse of dimensionality by minimizing a pessimistic bound on the policy loss. This approach turns ADP into an optimization problem, for which we derive new mathematical program formulations and analyze its properties. DRADP improves on the theoretical guarantees of existing…
10 Citations
Fast Bellman Updates for Robust MDPs
- Computer ScienceICML
- 2018
Two efficient, and exact, algorithms for computing Bellman updates in robust Markov decision processes (MDPs) that compute the primal solution in addition to the optimal objective value are described, which makes them useful in policy iteration methods.
Partial Policy Iteration for L1-Robust Markov Decision Processes
- Computer ScienceJ. Mach. Learn. Res.
- 2021
This paper proposes partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs, and proposes fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the linear complexity the non-robust Bellman operators.
RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning
- Computer ScienceNIPS
- 2014
We describe how to use robust Markov decision processes for value function approximation with state aggregation. The robustness serves to reduce the sensitivity to the approximation error of…
Tight Bayesian Ambiguity Sets for Robust MDPs
- Computer ScienceArXiv
- 2018
RSVF is proposed, which achieves less conservative solutions with the same worst-case guarantees by leveraging a Bayesian prior, optimizing the size and location of the ambiguity set, and relaxing the requirement that the set is a confidence interval.
Non-asymptotic Performances of Robust Markov Decision Processes
- MathematicsArXiv
- 2021
This paper considers three different uncertainty sets including the L1, χ 2 and KL balls in both (s, a)-rectangular and s-rectangular assumptions to find the non-asymptotic performance of optimal policy on robust value function with true transition dynamics.
Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty
- Computer ScienceArXiv
- 2020
This paper proposes the general problem formulation under the concept of RCMDP, and proposes a Lagrangian formulation of the optimal problem, leading to a robust-constrained policy gradient RL algorithm, and finally validate this concept on the inventory management problem.
Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
- Computer ScienceICML
- 2019
These results are extended to the much larger class of kernelbased approximators and show, both analytically and empirically that the robust policies can significantly outperform the non-robust counterpart.
Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs
- Computer ScienceNeurIPS
- 2019
This paper proposes a new paradigm that can achieve better solutions with the same robustness guarantees without using confidence regions as ambiguity sets, and optimize the size and position of ambiguity sets using Bayesian inference.
Approximate Dynamic Programming for Commodity and Energy Merchant Operations
- Computer Science
- 2014
This work develops approximate dynamic programming (ADP) methods for computing near optimal policies and lower and upper bounds on the market value of these assets and unifies different ADP methods in the literature using the ALP relaxation framework, including the financial engineering based LSM method.
References
SHOWING 1-10 OF 21 REFERENCES
The Linear Programming Approach to Approximate Dynamic Programming
- Computer ScienceOper. Res.
- 2003
An efficient method based on linear programming for approximating solutions to large-scale stochastic control problems by "fits" a linear combination of pre-selected basis functions to the dynamic programming cost-to-go function.
Error Bounds for Approximate Policy Iteration
- MathematicsICML
- 2003
In Dynamic Programming, convergence of algorithms such as Value Iteration or Policy Iteration results -in discounted problems- from a contraction property of the back-up operator, guaranteeing…
Robust Approximate Bilinear Programming for Value Function Approximation
- Computer ScienceJ. Mach. Learn. Res.
- 2011
A new approximate bilinear programming formulation of value function approximation, which employs global optimization and provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual.
Robust Value Function Approximation Using Bilinear Programming
- Computer ScienceNIPS
- 2009
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new…
Global Optimization: Deterministic Approaches
- Computer Science
- 1992
This study develops a unifying approach to constrained global optimization. It provides insight into the underlying concepts and properties of diverse techniques recently proposed to solve a wide…
Symmetric approximate linear programming for factored MDPs with application to constrained problems
- Computer Science, MathematicsAnnals of Mathematics and Artificial Intelligence
- 2006
A composite approach is developed that symmetrically approximates the primal and dual optimization variables (effectively approximating both the objective function and the feasible region of the LP), leading to a formulation that is computationally feasible and suitable for solving constrained MDPs.
New representations and approximations for sequential decision making under uncertainty
- Computer Science
- 2007
This dissertation research tackled three outstanding issues in sequential decision making in uncertain environments: performing stable generalization during off-policy updates, balancing exploration with exploitation, and handling partial observability of the environment.
Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems
- Computer ScienceOper. Res.
- 2010
This paper proposes a model that describes uncertainty in both the distribution form (discrete, Gaussian, exponential, etc.) and moments (mean and covariance matrix) and demonstrates that for a wide range of cost functions the associated distributionally robust stochastic program can be solved efficiently.
Stable Dual Dynamic Programming
- Computer ScienceNIPS
- 2007
This paper investigates the convergence properties of these dual algorithms both theoretically and empirically, and shows how they can be scaled up by incorporating function approximation.
Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes
- Computer ScienceICML
- 2010
The proposed L1 regularization method can automatically select the appropriate richness of features and its performance does not degrade with an increasing number of features, relying on new and stronger sampling bounds for regularized approximate linear programs.