Robust Dynamic Programming

  title={Robust Dynamic Programming},
  author={Garud N. Iyengar},
  journal={Math. Oper. Res.},
  • G. Iyengar
  • Published 1 May 2005
  • Mathematics, Economics
  • Math. Oper. Res.
In this paper we propose a robust formulation for discrete time dynamic programming (DP). The objective of the robust formulation is to systematically mitigate the sensitivity of the DP optimal policy to ambiguity in the underlying transition probabilities. The ambiguity is modeled by associating a set of conditional measures with each state-action pair. Consequently, in the robust formulation each policy has a set of measures associated with it. We prove that when this set of measures has a… 

Robust Modified Policy Iteration

This work considers the computation of robust DP solutions for discrete-stage, infinite-horizon, discounted problems with finite state and action spaces, and proposes inexact RMPI, in which the inner problem is solved to within a specified tolerance.

Q-Learning for Distributionally Robust Markov Decision Processes

A Q-learning approach is introduced to solve distributionally robust Markov Decision Processes with Borel state and action spaces and infinite time horizon via simulation-based techniques and it is proved that the value function is the unique fixed point of an operator.

A Robust Approach to Markov Decision Problems with Uncertain Transition Probabilities

Abstract This paper considers a discrete-time infinite horizon discounted cost Markov decision problem in which the transition probability vector for each state-control pair is uncertain. A popular

Duality in Robust Dynamic Programming : Pricing Convertibles , Stochastic Games and Control

This work studies the computational methodologies to develop and validate feasible control policies for the Robust Dynamic Programming Problem and generalizes the Information Relaxation and Dual approach of Brown, Smith and Sun to robust multi period problems.

Distributionally Robust Markov Decision Processes and Their Connection to Risk Measures

Under integrability, continuity, and compactness assumptions, a robust cost iteration for a fixed policy of the decision maker and a value iteration for the robust optimization problem are derived and the existence of deterministic optimal policies for both players is shown.

Robust control of the multi-armed bandit problem

We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We first show that for each arm

Partial Policy Iteration for L1-Robust Markov Decision Processes

This paper proposes partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs, and proposes fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the linear complexity the non-robust Bellman operators.

Safe Policy Improvement by Minimizing Robust Baseline Regret

This paper develops and analyzes a new model-based approach to compute a safe policy when the authors have access to an inaccurate dynamics model of the system with known accuracy guarantees, and uses this model to directly minimize the (negative) regret w.r.t. the baseline policy.

Structural properties of a class of robust inventory and queueing control problems

This work identifies the cases where certain monotonicity results still hold and the form of the optimal policy is determined by a threshold and investigates the marginal value of time and the case of uncertain rewards.

Sample Complexity of Robust Reinforcement Learning with a Generative Model

This work proposes a model-based reinforcement learning (RL) algorithm for learning an ε -optimal robust policy when the nominal model is unknown, and considers three forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence.



Robust Portfolio Selection Problems

This paper introduces "uncertainty structures" for the market parameters and shows that the robust portfolio selection problems corresponding to these uncertainty structures can be reformulated as second-order cone programs and, therefore, the computational effort required to solve them is comparable to that required for solving convex quadratic programs.

Robust Control and Model Uncertainty

where Q is a set of measures over c and x, and d is a discount rate. Gilboa and Schmeidler’s theory leaves open how to specify the set Q in particular applications. Criteria like (1) also appear as

Robustness in Markov Decision Problems with Uncertain Transition Matrices

This work proposes an algorithm for solving finite-state and finite-action MDPs, where the solution is guaranteed to be robust with respect to estimation errors on the state transition probabilities, via Kullback-Leibler divergence bounds.

Weighted Discounted Stochastic Games with Perfect Information

We consider a two-person zero-sum stochastic game with an infinite-time horizon. The payoff is a linear combination of expected total discounted re­ wards with different discount factors. For a model

Markovian Decision Processes with Uncertain Transition Probabilities

This paper examines Markovian decision processes in which the transition probabilities corresponding to alternative decisions are not known with certainty and discusses asymptotically Bayes-optimal policies.

Minimax analysis of stochastic problems

It is shown that, under mild regularity conditions, such a min-max problem generates a probability distribution on the set of permissible distributions with the min- max problem being equivalent to the expected value problem with respect to the corresponding weighted distribution.

The Linear Programming Approach to Approximate Dynamic Programming

An efficient method based on linear programming for approximating solutions to large-scale stochastic control problems by "fits" a linear combination of pre-selected basis functions to the dynamic programming cost-to-go function.

Robust Convex Optimization

If U is an ellipsoidal uncertainty set, then for some of the most important generic convex optimization problems (linear programming, quadratically constrained programming, semidefinite programming and others) the corresponding robust convex program is either exactly, or approximately, a tractable problem which lends itself to efficientalgorithms such as polynomial time interior point methods.

Markov Decision Processes: Discrete Stochastic Dynamic Programming

  • M. Puterman
  • Computer Science
    Wiley Series in Probability and Statistics
  • 1994
Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.