Corpus ID: 235457996

Mungojerrie: Reinforcement Learning of Linear-Time Objectives

@article{Hahn2021MungojerrieRL,
  title={Mungojerrie: Reinforcement Learning of Linear-Time Objectives},
  author={E. M. Hahn and Mateo Perez and S. Schewe and F. Somenzi and A. Trivedi and D. Wojtczak},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.09161}
}
Reinforcement learning synthesizes controllers without prior knowledge of the system. At each timestep, a reward is given. The controllers optimize the discounted sum of these rewards. Applying this class of algorithms requires designing a reward scheme, which is typically done manually. The designer must ensure that their intent is accurately captured. This may not be trivial, and is prone to error. An alternative to this manual programming, akin to programming directly in assembly, is to… Expand

Figures from this paper

References

SHOWING 1-10 OF 37 REFERENCES
Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning
We present a reinforcement learning (RL) frame-work to synthesize a control policy from a given linear temporal logic (LTL) specification in an unknown stochastic environment that can be modeled as aExpand
Omega-Regular Objectives in Model-Free Reinforcement Learning
TLDR
This work presents a constructive reduction from the almost-sure satisfaction of \(\omega \)-regular objectives to analmost-sure reachability problem, and extends this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. Expand
Reinforcement Learning: An Introduction
TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications. Expand
Q-learning
TLDR
This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely. Expand
A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications
We propose to synthesize a control policy for a Markov decision process (MDP) such that the resulting traces of the MDP satisfy a linear temporal logic (LTL) property. We construct a product MDP thatExpand
A Simple Algorithm for Solving Qualitative Probabilistic Parity Games
TLDR
This paper exploits a simple and only mildly adjusted algorithm from the analysis of non-probabilistic systems, and uses it to show that the qualitative analysis of probabilistic games inherits the much celebrated sub-exponential complexity from 2-player games. Expand
Model-Free Reinforcement Learning for Stochastic Parity Games
TLDR
A streamlined reduction from 1 2 -player parity games to reachability games that avoids recourse to nondeterminism is presented and model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. Expand
Solving Rubik's Cube with a Robot Hand
  • OpenAI, I. Akkaya, +16 authors Lei Zhang
  • Computer Science, Mathematics
  • ArXiv
  • 2019
TLDR
It is demonstrated that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot, made possible by a novel algorithm, which is called automatic domain randomization (ADR), and a robot platform built for machine learning. Expand
Human-level control through deep reinforcement learning
TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks. Expand
Double Q-learning
TLDR
An alternative way to approximate the maximum expected value for any set of random variables is introduced and the obtained double estimator method is shown to sometimes underestimate rather than overestimate themaximum expected value. Expand
...
1
2
3
4
...