Long-Term Values in Markov Decision Processes, (Co)Algebraically

@inproceedings{Feys2018LongTermVI,
  title={Long-Term Values in Markov Decision Processes, (Co)Algebraically},
  author={Frank M. V. Feys and Helle Hvid Hansen and Lawrence S. Moss},
  booktitle={CMCS},
  year={2018}
}
This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof… 

Co)Algebraic Techniques for Markov Decision Processes

TLDR
This work is inspired by Bellman’s principle of optimality, which states that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

CertRL: formalizing convergence proofs for value and policy iteration in Coq

TLDR
A Coq formalization of two canonical reinforcement learning algorithms: value and policy iteration for finite state Markov decision processes and a contraction property of Bellman optimality operator to establish that a sequence converges in the infinite horizon limit.

Categorical semantics of compositional reinforcement learning

TLDR
This work develops a framework for a compositional theory of RL using a categorical point of view and investigates sufficient conditions under which learning-by-parts results in the same optimal policy as learning on the whole.

Value iteration is optic composition

TLDR
It is shown that value improvement, one of the main steps of dynamic programming, can be naturally seen as composition in a category of optics, and intuitively, the optimal value function is the limit of a chain of optic compositions.

Introspection Learning.

TLDR
This paper presents Introspection Learning, an algorithm that allows for the asking of these types of questions of neural network policies, and demonstrates the usefulness of this algorithm both in the context of speeding up training and improving robustness with respect to safety constraints.

Introspection Learning

TLDR
This paper presents Introspection Learning, an algorithm that allows for the asking of these types of questions of neural network policies, and demonstrates the usefulness of this algorithm both in the context of speeding up training and improving robustness with respect to safety constraints.

Representation and Invariance in Reinforcement Learning

TLDR
It is shown that three concrete mappings between various RL frameworks satisfy sufficient conditions and therefore preserve suitably-measured relative intelligence, and proves an impossibility theorem about RL intelligence measurement.

References

SHOWING 1-10 OF 30 REFERENCES

Bisimulation for labelled Markov processes

TLDR
The main result is that a notion of bisimulation for Markov processes on Polish spaces, which extends the Larsen-Skou definition for discrete systems, is indeed an equivalence relation.

Markov Decision Processes: Discrete Stochastic Dynamic Programming

  • M. Puterman
  • Computer Science
    Wiley Series in Probability and Statistics
  • 1994
TLDR
Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.

Coalgebraic analysis of subgame-perfect equilibria in infinite games without discounting

TLDR
A novel coalgebraic formulation of infinite extensive games is presented, which proves a form of one-deviation principle without any such assumptions and suggests that coalgebra supports a more adequate treatment of infinite-horizon models in game theory and economics.

Universal coalgebra: a theory of systems

Applications of Metric Coinduction

TLDR
This paper examines the application of the coinduction principle in a variety of areas, including infinite streams, Markov chains,Markov decision processes, and non-well-founded sets, and points to the usefulness of coinductions as a general proof technique.

A Categorical Approach to Probability Theory

TLDR
This work shows that the category ID of D-posets of fuzzy sets and sequentially continuous D-homomorphisms allows to characterize the passage from classical to fuzzy events as the minimal generalization having nontrivial quantum character.

Behavioral Metrics via Functor Lifting

TLDR
Two different approaches which can be viewed as generalizations of the Kantorovich and Wasserstein pseudometrics for probability measures are presented, which coincide on several natural examples, but in general they differ.

Generalizing determinization from automata to coalgebras

TLDR
This paper lifts the powerset construction from automata to the more general framework of coal- gebras with structured state spaces and shows how to characterize coalgebraically several equivalences which have been object of interest in the concurrency community, such as failure or ready semantics.

Coinductive Proof Principles for Stochastic Processes

  • D. Kozen
  • Mathematics
    21st Annual IEEE Symposium on Logic in Computer Science (LICS'06)
  • 2006
We give an explicit coinduction principle for recursively-defined stochastic processes. The principle applies to any closed property, not just equality, and works even when solutions are not unique.

Trace semantics via determinization