Logically-Correct Reinforcement Learning

  title={Logically-Correct Reinforcement Learning},
  author={Mohammadhosein Hasanbeig and Alessandro Abate and Daniel Kroening},
We propose a novel Reinforcement Learning (RL) algorithm to synthesize policies for a Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the property into a Limit Deterministic Büchi Automaton (LDBA), then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the LDBA. With this reward function, RL synthesises a policy that satisfies… CONTINUE READING
Recent Discussions
This paper has been referenced on Twitter 1 time over the past 90 days. VIEW TWEETS


Publications referenced by this paper.
Showing 1-10 of 28 references

On synchronous binary log-linear learning and second order Q-learning

  • M. Hasanbeig, L. Pavel
  • The 20th World Congress of the International…
  • 2017

Similar Papers

Loading similar papers…