• Corpus ID: 393513

Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds

@article{Petrik2012ApproximateDP,
  title={Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds},
  author={Marek Petrik},
  journal={ArXiv},
  year={2012},
  volume={abs/1205.1782}
}
Approximate dynamic programming is a popular method for solving large Markov decision processes. This paper describes a new class of approximate dynamic programming (ADP) methods- distributionally robust ADP-that address the curse of dimensionality by minimizing a pessimistic bound on the policy loss. This approach turns ADP into an optimization problem, for which we derive new mathematical program formulations and analyze its properties. DRADP improves on the theoretical guarantees of existing… 

Figures from this paper

Fast Bellman Updates for Robust MDPs
TLDR
Two efficient, and exact, algorithms for computing Bellman updates in robust Markov decision processes (MDPs) that compute the primal solution in addition to the optimal objective value are described, which makes them useful in policy iteration methods.
Partial Policy Iteration for L1-Robust Markov Decision Processes
TLDR
This paper proposes partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs, and proposes fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the linear complexity the non-robust Bellman operators.
RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning
We describe how to use robust Markov decision processes for value function approximation with state aggregation. The robustness serves to reduce the sensitivity to the approximation error of
Tight Bayesian Ambiguity Sets for Robust MDPs
TLDR
RSVF is proposed, which achieves less conservative solutions with the same worst-case guarantees by leveraging a Bayesian prior, optimizing the size and location of the ambiguity set, and relaxing the requirement that the set is a confidence interval.
Non-asymptotic Performances of Robust Markov Decision Processes
TLDR
This paper considers three different uncertainty sets including the L1, χ 2 and KL balls in both (s, a)-rectangular and s-rectangular assumptions to find the non-asymptotic performance of optimal policy on robust value function with true transition dynamics.
Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty
TLDR
This paper proposes the general problem formulation under the concept of RCMDP, and proposes a Lagrangian formulation of the optimal problem, leading to a robust-constrained policy gradient RL algorithm, and finally validate this concept on the inventory management problem.
Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
TLDR
These results are extended to the much larger class of kernelbased approximators and show, both analytically and empirically that the robust policies can significantly outperform the non-robust counterpart.
Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs
TLDR
This paper proposes a new paradigm that can achieve better solutions with the same robustness guarantees without using confidence regions as ambiguity sets, and optimize the size and position of ambiguity sets using Bayesian inference.
Approximate Dynamic Programming for Commodity and Energy Merchant Operations
TLDR
This work develops approximate dynamic programming (ADP) methods for computing near optimal policies and lower and upper bounds on the market value of these assets and unifies different ADP methods in the literature using the ALP relaxation framework, including the financial engineering based LSM method.

References

SHOWING 1-10 OF 21 REFERENCES
The Linear Programming Approach to Approximate Dynamic Programming
TLDR
An efficient method based on linear programming for approximating solutions to large-scale stochastic control problems by "fits" a linear combination of pre-selected basis functions to the dynamic programming cost-to-go function.
Error Bounds for Approximate Policy Iteration
In Dynamic Programming, convergence of algorithms such as Value Iteration or Policy Iteration results -in discounted problems- from a contraction property of the back-up operator, guaranteeing
Robust Approximate Bilinear Programming for Value Function Approximation
TLDR
A new approximate bilinear programming formulation of value function approximation, which employs global optimization and provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual.
Robust Value Function Approximation Using Bilinear Programming
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new
Global Optimization: Deterministic Approaches
This study develops a unifying approach to constrained global optimization. It provides insight into the underlying concepts and properties of diverse techniques recently proposed to solve a wide
Symmetric approximate linear programming for factored MDPs with application to constrained problems
  • D. Dolgov, E. Durfee
  • Computer Science, Mathematics
    Annals of Mathematics and Artificial Intelligence
  • 2006
TLDR
A composite approach is developed that symmetrically approximates the primal and dual optimization variables (effectively approximating both the objective function and the feasible region of the LP), leading to a formulation that is computationally feasible and suitable for solving constrained MDPs.
New representations and approximations for sequential decision making under uncertainty
TLDR
This dissertation research tackled three outstanding issues in sequential decision making in uncertain environments: performing stable generalization during off-policy updates, balancing exploration with exploitation, and handling partial observability of the environment.
Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems
TLDR
This paper proposes a model that describes uncertainty in both the distribution form (discrete, Gaussian, exponential, etc.) and moments (mean and covariance matrix) and demonstrates that for a wide range of cost functions the associated distributionally robust stochastic program can be solved efficiently.
Stable Dual Dynamic Programming
TLDR
This paper investigates the convergence properties of these dual algorithms both theoretically and empirically, and shows how they can be scaled up by incorporating function approximation.
Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes
TLDR
The proposed L1 regularization method can automatically select the appropriate richness of features and its performance does not degrade with an increasing number of features, relying on new and stronger sampling bounds for regularized approximate linear programs.
...
...