Algebraic optimization of sequential decision problems
@article{Dressler2022AlgebraicOO, title={Algebraic optimization of sequential decision problems}, author={Mareike Dressler and Marina Garrote-L'opez and Guido Mont{\'u}far and Johannes Muller and Kemal Rose}, journal={ArXiv}, year={2022}, volume={abs/2211.09439} }
. We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic policies. In the case of deterministic observations, also known as state aggregation, the problem is equivalent to optimizing a linear objective subject to quadratic constraints. We characterize the feasible set of this problem as the intersection of a product of affine varieties of rank one matrices and a polytope. Based on this description…
References
SHOWING 1-10 OF 37 REFERENCES
The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs
- MathematicsICLR
- 2022
This work considers the problem of finding the best memoryless stochastic policy for an infinitehorizon partially observable Markov decision process with finite state and action spaces with respect to either the discounted or mean reward criterion and describes the optimization problem as a linear optimization problem in the space of feasible state-action frequencies subject to polynomial constraints.
On the Computational Complexity of Stochastic Controller Optimization in POMDPs
- Computer ScienceTOCT
- 2012
The result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard, and outlines a special case that is convex and admits efficient global solutions.
Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes
- Computer ScienceArXiv
- 2015
It is shown that any POMDP has an optimal memoryless policy of limited stochasticity, which allows us to reduce the dimensionality of the search space and enables better and faster convergence of the policy gradient on the evaluated systems.
Geometric Policy Iteration for Markov Decision Processes
- Computer ScienceKDD
- 2022
This work proposes a new algorithm, Geometric Policy Iteration (GPI), to solve discounted MDPs and proves that the complexity of GPI achieves the best known bound O|𝓐|over 1 - γ log 1 over 1-γ of policy iteration.
Solving POMDPs using quadratically constrained linear programs
- Computer ScienceAAMAS '06
- 2006
This work describes a new approach that addresses the space requirement of POMDP algorithms while maintaining well-defined optimality guarantees.
Global Optimization with Polynomials and the Problem of Moments
- MathematicsSIAM J. Optim.
- 2001
It is shown that the problem of finding the unconstrained global minimum of a real-valued polynomial p(x): R n to R, in a compact set K defined byPolynomial inequalities reduces to solving an (often finite) sequence of convex linear matrix inequality (LMI) problems.
Nonlinear Optimal Control via Occupation Measures and LMI-Relaxations
- MathematicsSIAM J. Control. Optim.
- 2008
This work provides a simple hierarchy of LMI- (linear matrix inequality)-relaxations whose optimal values form a nondecreasing sequence of lower bounds on the optimal value of the OCP under some convexity assumptions.
Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory
- MathematicsMath. Methods Oper. Res.
- 1994
This paper gives an overview of linear programming methods for solving standard and nonstandard Markovian control problems, and a particular class of stochastic games.
Algebraic Degree of Polynomial Optimization
- Mathematics, Computer ScienceSIAM J. Optim.
- 2009
It is proved that the optimality conditions always hold on optimizers, and the coordinates of optimizers are algebraic functions of the coefficients of the input polynomials, and a general formula is given for the algebraic degree of the optimal coordinates.
Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space
- Computer ScienceArXiv
- 2022
This work presents an approach for Reward Optimization in State-Action space (ROSA) and finds that ROSA is computationally efficient and can yield stability improvements over other existing methods.