Long-Term Values in Markov Decision Processes, (Co)Algebraically

@inproceedings{Feys2018LongTermVI,
  title={Long-Term Values in Markov Decision Processes, (Co)Algebraically},
  author={Frank M. V. Feys and Helle Hvid Hansen and Lawrence S. Moss},
  booktitle={CMCS},
  year={2018}
}
This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof… 

Co)Algebraic Techniques for Markov Decision Processes

This work is inspired by Bellman’s principle of optimality, which states that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

Categorical semantics of compositional reinforcement learning

This work develops a framework for a compositional theory of RL using a categorical point of view and investigates sufficient conditions under which learning-by-parts results in the same optimal policy as learning on the whole.

Value iteration is optic composition

It is shown that value improvement, one of the main steps of dynamic programming, can be naturally seen as composition in a category of optics, and intuitively, the optimal value function is the limit of a chain of optic compositions.

Introspection Learning.

This paper presents Introspection Learning, an algorithm that allows for the asking of these types of questions of neural network policies, and demonstrates the usefulness of this algorithm both in the context of speeding up training and improving robustness with respect to safety constraints.

Introspection Learning

This paper presents Introspection Learning, an algorithm that allows for the asking of these types of questions of neural network policies, and demonstrates the usefulness of this algorithm both in the context of speeding up training and improving robustness with respect to safety constraints.

Representation and Invariance in Reinforcement Learning

This paper lays foundations for studying relative-intelligence-preserving mappability between RL frameworks, and investigates whether or not this is possible depends on the RL frameworks in question and on how intelligence is measured.

CertRL: formalizing convergence proofs for value and policy iteration in Coq

A Coq formalization of two canonical reinforcement learning algorithms: value and policy iteration for finite state Markov decision processes and a contraction property of Bellman optimality operator to establish that a sequence converges in the infinite horizon limit.

References

SHOWING 1-10 OF 30 REFERENCES

Bisimulation for labelled Markov processes

The main result is that a notion of bisimulation for Markov processes on Polish spaces, which extends the Larsen-Skou definition for discrete systems, is indeed an equivalence relation.

Markov Decision Processes: Discrete Stochastic Dynamic Programming

  • M. Puterman
  • Computer Science
    Wiley Series in Probability and Statistics
  • 1994
Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.

Coalgebraic analysis of subgame-perfect equilibria in infinite games without discounting

A novel coalgebraic formulation of infinite extensive games is presented, which proves a form of one-deviation principle without any such assumptions and suggests that coalgebra supports a more adequate treatment of infinite-horizon models in game theory and economics.

Metrics for Finite Markov Decision Processes

The formulation of metrics for measuring the similarity of states in a finite Markov decision process is based on the notion of bisimulation for MDPs, with an aim towards solving discounted infinite horizon reinforcement learning tasks.

Universal coalgebra: a theory of systems

Applications of Metric Coinduction

This paper examines the application of the coinduction principle in a variety of areas, including infinite streams, Markov chains,Markov decision processes, and non-well-founded sets, and points to the usefulness of coinductions as a general proof technique.

A Categorical Approach to Probability Theory

This work shows that the category ID of D-posets of fuzzy sets and sequentially continuous D-homomorphisms allows to characterize the passage from classical to fuzzy events as the minimal generalization having nontrivial quantum character.

Behavioral Metrics via Functor Lifting

Two different approaches which can be viewed as generalizations of the Kantorovich and Wasserstein pseudometrics for probability measures are presented, which coincide on several natural examples, but in general they differ.

Generalizing determinization from automata to coalgebras

This paper lifts the powerset construction from automata to the more general framework of coal- gebras with structured state spaces and shows how to characterize coalgebraically several equivalences which have been object of interest in the concurrency community, such as failure or ready semantics.

Coinductive Proof Principles for Stochastic Processes

  • D. Kozen
  • Mathematics
    21st Annual IEEE Symposium on Logic in Computer Science (LICS'06)
  • 2006
We give an explicit coinduction principle for recursively-defined stochastic processes. The principle applies to any closed property, not just equality, and works even when solutions are not unique.