• Corpus ID: 6807982

Regret based Robust Solutions for Uncertain Markov Decision Processes

  title={Regret based Robust Solutions for Uncertain Markov Decision Processes},
  author={Asrar Ahmed and Pradeep Varakantham and Yossiri Adulyasak and Patrick Jaillet},
In this paper, we seek robust policies for uncertain Markov Decision Processes (MDPs). Most robust optimization approaches for these problems have focussed on the computation of maximin policies which maximize the value corresponding to the worst realization of the uncertainty. Recent work has proposed minimax regret as a suitable alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over… 

Figures from this paper

Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)

A general model of uncertain MDPs that considers uncertainty over both transition and reward functions and observes that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret.

Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes

This work proposes a dynamic programming algorithm that utilises the regret Bellman equation, and shows that it optimises minimax regret exactly for UMDPs with independent uncertainties and coupled uncertainties.

Solving Uncertain MDPs with Objectives that Are Separable over Instantiations of Model Uncertainty

This work identifies two separable objectives for uncertain MDPs: Average Value Maximization (AVM) and Confidence Probability Maximisation (CPM) and provides optimization based solutions to compute policies for uncertainMDPs with such objectives.

Safe Policy Improvement by Minimizing Robust Baseline Regret

This paper develops and analyzes a new model-based approach to compute a safe policy when the authors have access to an inaccurate dynamics model of the system with known accuracy guarantees, and uses this model to directly minimize the (negative) regret w.r.t. the baseline policy.

Analysis of approximation and uncertainty in optimization

A probabilistic analysis of knapsack problems, proving that rollout algorithms perform significantly better than their base policies, and a randomized model for minmax regret in combinatorial optimization under cost uncertainty.

Robust Policy Optimization with Baseline Guarantees

Novel model-based safe policy search algorithms that provide performance guarantees and show a trade-off between their complexity and conservatism are developed and illustrate the effectiveness of these algorithms using a numerical example.

Bayesian Robust Optimization for Imitation Learning

BROIL leverages Bayesian reward function inference and a user specific risk tolerance to efficiently optimize a robust policy that balances expected return and conditional value at risk and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms.

Multi-Agent Planning with Baseline Regret Minimization

We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least

Decentralized stochastic planning with anonymity in interactions Citation

This paper introduces a general model called Decentralized Stochastic Planning with Anonymous Interactions (D-SPAIT) to represent anonymity in interactions within the context of the DecMDPs framework, and develops optimization based optimal and local-optimal solutions for generalizable sub-categories of D-SPA IT.

Solving Uncertain MDPs by Reusing State Information and Plans

This paper introduces a general framework that allows off-the-shelf MDP algorithms to solve Uncertain MDPs by planning based on currently available information and replan if and when the problem changes.



Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies

This work develops new techniques for the robust optimization of IRMDPs, using the minimax regret decision criterion, that exploit the set of nondominated policies, i.e., policies that are optimal for some instantiation of the imprecise reward function.

Solving Uncertain Markov Decision Processes

The authors demonstrate that the uncertain model approach can be used to solve a class of nearly Markovian Decision Problems, providing lower bounds on performance in stochastic models with higher-order interactions.

Robust Markov Decision Processes

This work considers robust MDPs that offer probabilistic guarantees in view of the unknown parameters to counter the detrimental effects of estimation errors and determines a policy that attains the highest worst-case performance over this confidence region.

Parametric regret in uncertain Markov decision processes

  • Huan XuShie Mannor
  • Computer Science
    Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference
  • 2009
It is shown that the problem of computing a minimax regret strategy is NP-hard and proposed algorithms to efficiently finding it under favorable conditions and it is proved that computing such a strategy can be done numerically in an efficient way.

Regret-based Reward Elicitation for Markov Decision Processes

Empirical results demonstrate that regret-based reward elicitation offers an effective way to produce near-optimal policies without resorting to the precise specification of the entire reward function.

Robust Control of Markov Decision Processes with Uncertain Transition Matrices

This work considers a robust control problem for a finite-state, finite-action Markov decision process, where uncertainty on the transition matrices is described in terms of possibly nonconvex sets, and shows that perfect duality holds for this problem, and that it can be solved with a variant of the classical dynamic programming algorithm, the "robust dynamic programming" algorithm.

A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

This paper presents a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states.

Loss bounds for uncertain transition probabilities in Markov decision processes

The approach analyzes the growth of errors incurred by stepping backwards in time while precomputing value functions, which requires bounding a multilinear program.

Robust Dynamic Programming

  • G. Iyengar
  • Mathematics, Economics
    Math. Oper. Res.
  • 2005
It is proved that when this set of measures has a certain "rectangularity" property, all of the main results for finite and infinite horizon DP extend to natural robust counterparts.

Bounded-parameter Markov decision processes