Algebraic optimization of sequential decision problems

@article{Dressler2022AlgebraicOO,
  title={Algebraic optimization of sequential decision problems},
  author={Mareike Dressler and Marina Garrote-L'opez and Guido Mont{\'u}far and Johannes Muller and Kemal Rose},
  journal={ArXiv},
  year={2022},
  volume={abs/2211.09439}
}
. We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic policies. In the case of deterministic observations, also known as state aggregation, the problem is equivalent to optimizing a linear objective subject to quadratic constraints. We characterize the feasible set of this problem as the intersection of a product of affine varieties of rank one matrices and a polytope. Based on this description… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 37 REFERENCES

The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs

This work considers the problem of finding the best memoryless stochastic policy for an infinitehorizon partially observable Markov decision process with finite state and action spaces with respect to either the discounted or mean reward criterion and describes the optimization problem as a linear optimization problem in the space of feasible state-action frequencies subject to polynomial constraints.

On the Computational Complexity of Stochastic Controller Optimization in POMDPs

The result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard, and outlines a special case that is convex and admits efficient global solutions.

Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes

It is shown that any POMDP has an optimal memoryless policy of limited stochasticity, which allows us to reduce the dimensionality of the search space and enables better and faster convergence of the policy gradient on the evaluated systems.

Geometric Policy Iteration for Markov Decision Processes

This work proposes a new algorithm, Geometric Policy Iteration (GPI), to solve discounted MDPs and proves that the complexity of GPI achieves the best known bound O|𝓐|over 1 - γ log 1 over 1-γ of policy iteration.

Solving POMDPs using quadratically constrained linear programs

This work describes a new approach that addresses the space requirement of POMDP algorithms while maintaining well-defined optimality guarantees.

Global Optimization with Polynomials and the Problem of Moments

It is shown that the problem of finding the unconstrained global minimum of a real-valued polynomial p(x): R n to R, in a compact set K defined byPolynomial inequalities reduces to solving an (often finite) sequence of convex linear matrix inequality (LMI) problems.

Nonlinear Optimal Control via Occupation Measures and LMI-Relaxations

This work provides a simple hierarchy of LMI- (linear matrix inequality)-relaxations whose optimal values form a nondecreasing sequence of lower bounds on the optimal value of the OCP under some convexity assumptions.

Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory

This paper gives an overview of linear programming methods for solving standard and nonstandard Markovian control problems, and a particular class of stochastic games.

Algebraic Degree of Polynomial Optimization

It is proved that the optimality conditions always hold on optimizers, and the coordinates of optimizers are algebraic functions of the coefficients of the input polynomials, and a general formula is given for the algebraic degree of the optimal coordinates.

Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space

This work presents an approach for Reward Optimization in State-Action space (ROSA) and finds that ROSA is computationally efficient and can yield stability improvements over other existing methods.