# Reinforcement Learning for General LTL Objectives Is Intractable

@article{Yang2021ReinforcementLF, title={Reinforcement Learning for General LTL Objectives Is Intractable}, author={Cambridge Yang and Michael S. Littman and Michael Carbin}, journal={ArXiv}, year={2021}, volume={abs/2111.12679} }

In recent years, researchers have made significant progress in devising reinforcement-learning algorithms for optimizing linear temporal logic (LTL) objectives and LTL-like objectives. Despite these advancements, there are fundamental limitations to how well this problem can be solved that previous studies have alluded to but, to our knowledge, have not examined in depth. In this paper, we address theoretically the hardness of learning with general LTL objectives. We formalize the problem underβ¦Β

## 2 Citations

A Framework for Transforming Specifications in Reinforcement Learning

- Computer ScienceArXiv
- 2021

This work defines the notion of samplingbased reduction to transform a given MDP into another one which can be simulated even when the transition probabilities of the original MDP are unknown, and formalizes the notions of preservation of optimal policies, convergence, and robustness of such reductions.

LiVe 2022 6th Workshop on Learning in Verification Informal proceedings

- Computer Science

: With the growing popularity of machine learning, the quest for verifying data-driven models is attracting more and more attention, and researchers in automated verification are struggling to meetβ¦

## References

SHOWING 1-10 OF 46 REFERENCES

A Framework for Transforming Specifications in Reinforcement Learning

- Computer ScienceArXiv
- 2021

This work defines the notion of samplingbased reduction to transform a given MDP into another one which can be simulated even when the transition probabilities of the original MDP are unknown, and formalizes the notions of preservation of optimal policies, convergence, and robustness of such reductions.

Reduced variance deep reinforcement learning with temporal logic specifications

- Computer ScienceICCPS
- 2019

This is the first model-free deep reinforcement learning algorithm that can synthesize policies that maximize the probability of satisfying an LTL specification even if AMECs do not exist.

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

- Computer Science2019 IEEE 58th Conference on Decision and Control (CDC)
- 2019

A model-free reinforcement learning algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic formulas, which is even more general than a fully unknown MDP.

Efficient reinforcement learning

- Computer ScienceCOLT '94
- 1994

A new formal model for studying reinforcement learning, based on Valiant's PAC framework, that requires the learner to produce a policy whose expected value from the initial state is Ξ΅-close to that of the optimal policy, with probability no less than 1βΞ΄.

On the sample complexity of reinforcement learning.

- Computer Science
- 2003

Novel algorithms with more restricted guarantees are suggested whose sample complexities are again independent of the size of the state space and depend linearly on the complexity of the policy class, but have only a polynomial dependence on the horizon time.

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

- Computer ScienceNIPS
- 2015

The upper bound leverages Bernstein's inequality to improve on previous bounds for episodic finite-horizon MDPs which have a time-Horizon dependency of at least $H^3$.

Reinforcement learning with temporal logic rewards

- Computer Science2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2017

It is shown in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied.

Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks

- Computer ScienceArXiv
- 2020

This paper presents the first reward shaping framework for average-reward learning and proves that, under standard assumptions, the optimal policy under the original reward function can be recovered.

LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning

- Computer Science, PsychologyIJCAI
- 2019

This work proposes using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions, to ease the burden of complex reward function specification.

Mungojerrie: Reinforcement Learning of Linear-Time Objectives

- Computer ScienceArXiv
- 2021

Mungojerrie is a tool for testing reward schemes for Ο-regular objectives on finite models and contains reinforcement learning algorithms and a probabilistic model checker.