# Long-Term Values in Markov Decision Processes, (Co)Algebraically

@inproceedings{Feys2018LongTermVI, title={Long-Term Values in Markov Decision Processes, (Co)Algebraically}, author={Frank M. V. Feys and Helle Hvid Hansen and Lawrence S. Moss}, booktitle={CMCS}, year={2018} }

This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof…

## 7 Citations

### Co)Algebraic Techniques for Markov Decision Processes

- Computer Science
- 2019

This work is inspired by Bellman’s principle of optimality, which states that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

### CertRL: formalizing convergence proofs for value and policy iteration in Coq

- Computer ScienceCPP
- 2021

A Coq formalization of two canonical reinforcement learning algorithms: value and policy iteration for finite state Markov decision processes and a contraction property of Bellman optimality operator to establish that a sequence converges in the infinite horizon limit.

### Categorical semantics of compositional reinforcement learning

- Computer Science
- 2022

This work develops a framework for a compositional theory of RL using a categorical point of view and investigates sufficient conditions under which learning-by-parts results in the same optimal policy as learning on the whole.

### Value iteration is optic composition

- Computer Science, Mathematics
- 2022

It is shown that value improvement, one of the main steps of dynamic programming, can be naturally seen as composition in a category of optics, and intuitively, the optimal value function is the limit of a chain of optic compositions.

### Introspection Learning.

- Computer Science
- 2019

This paper presents Introspection Learning, an algorithm that allows for the asking of these types of questions of neural network policies, and demonstrates the usefulness of this algorithm both in the context of speeding up training and improving robustness with respect to safety constraints.

### Introspection Learning

- Computer ScienceArXiv
- 2019

This paper presents Introspection Learning, an algorithm that allows for the asking of these types of questions of neural network policies, and demonstrates the usefulness of this algorithm both in the context of speeding up training and improving robustness with respect to safety constraints.

### Representation and Invariance in Reinforcement Learning

- Computer ScienceArXiv
- 2021

It is shown that three concrete mappings between various RL frameworks satisfy sufficient conditions and therefore preserve suitably-measured relative intelligence, and proves an impossibility theorem about RL intelligence measurement.

## References

SHOWING 1-10 OF 30 REFERENCES

### Bisimulation for labelled Markov processes

- Mathematics, Computer ScienceProceedings of Twelfth Annual IEEE Symposium on Logic in Computer Science
- 1997

The main result is that a notion of bisimulation for Markov processes on Polish spaces, which extends the Larsen-Skou definition for discrete systems, is indeed an equivalence relation.

### Markov Decision Processes: Discrete Stochastic Dynamic Programming

- Computer ScienceWiley Series in Probability and Statistics
- 1994

Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.

### Coalgebraic analysis of subgame-perfect equilibria in infinite games without discounting

- EconomicsMathematical Structures in Computer Science
- 2015

A novel coalgebraic formulation of infinite extensive games is presented, which proves a form of one-deviation principle without any such assumptions and suggests that coalgebra supports a more adequate treatment of infinite-horizon models in game theory and economics.

### Applications of Metric Coinduction

- MathematicsCALCO
- 2007

This paper examines the application of the coinduction principle in a variety of areas, including infinite streams, Markov chains,Markov decision processes, and non-well-founded sets, and points to the usefulness of coinductions as a general proof technique.

### A Categorical Approach to Probability Theory

- Computer Science, MathematicsStud Logica
- 2010

This work shows that the category ID of D-posets of fuzzy sets and sequentially continuous D-homomorphisms allows to characterize the passage from classical to fuzzy events as the minimal generalization having nontrivial quantum character.

### Behavioral Metrics via Functor Lifting

- MathematicsFSTTCS
- 2014

Two different approaches which can be viewed as generalizations of the Kantorovich and Wasserstein pseudometrics for probability measures are presented, which coincide on several natural examples, but in general they differ.

### Generalizing determinization from automata to coalgebras

- Computer ScienceLog. Methods Comput. Sci.
- 2013

This paper lifts the powerset construction from automata to the more general framework of coal- gebras with structured state spaces and shows how to characterize coalgebraically several equivalences which have been object of interest in the concurrency community, such as failure or ready semantics.

### Coinductive Proof Principles for Stochastic Processes

- Mathematics21st Annual IEEE Symposium on Logic in Computer Science (LICS'06)
- 2006

We give an explicit coinduction principle for recursively-defined stochastic processes. The principle applies to any closed property, not just equality, and works even when solutions are not unique.…