• Corpus ID: 244527237

Reinforcement Learning for General LTL Objectives Is Intractable

@article{Yang2021ReinforcementLF,
  title={Reinforcement Learning for General LTL Objectives Is Intractable},
  author={Cambridge Yang and Michael S. Littman and Michael Carbin},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.12679}
}
In recent years, researchers have made significant progress in devising reinforcement-learning algorithms for optimizing linear temporal logic (LTL) objectives and LTL-like objectives. Despite these advancements, there are fundamental limitations to how well this problem can be solved that previous studies have alluded to but, to our knowledge, have not examined in depth. In this paper, we address theoretically the hardness of learning with general LTL objectives. We formalize the problem under… 

Figures from this paper

A Framework for Transforming Specifications in Reinforcement Learning
TLDR
This work defines the notion of samplingbased reduction to transform a given MDP into another one which can be simulated even when the transition probabilities of the original MDP are unknown, and formalizes the notions of preservation of optimal policies, convergence, and robustness of such reductions.
LiVe 2022 6th Workshop on Learning in Verification Informal proceedings
: With the growing popularity of machine learning, the quest for verifying data-driven models is attracting more and more attention, and researchers in automated verification are struggling to meet…

References

SHOWING 1-10 OF 46 REFERENCES
A Framework for Transforming Specifications in Reinforcement Learning
TLDR
This work defines the notion of samplingbased reduction to transform a given MDP into another one which can be simulated even when the transition probabilities of the original MDP are unknown, and formalizes the notions of preservation of optimal policies, convergence, and robustness of such reductions.
Reduced variance deep reinforcement learning with temporal logic specifications
TLDR
This is the first model-free deep reinforcement learning algorithm that can synthesize policies that maximize the probability of satisfying an LTL specification even if AMECs do not exist.
Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees
TLDR
A model-free reinforcement learning algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic formulas, which is even more general than a fully unknown MDP.
Efficient reinforcement learning
TLDR
A new formal model for studying reinforcement learning, based on Valiant's PAC framework, that requires the learner to produce a policy whose expected value from the initial state is Ξ΅-close to that of the optimal policy, with probability no less than 1βˆ’Ξ΄.
On the sample complexity of reinforcement learning.
TLDR
Novel algorithms with more restricted guarantees are suggested whose sample complexities are again independent of the size of the state space and depend linearly on the complexity of the policy class, but have only a polynomial dependence on the horizon time.
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
TLDR
The upper bound leverages Bernstein's inequality to improve on previous bounds for episodic finite-horizon MDPs which have a time-Horizon dependency of at least $H^3$.
Reinforcement learning with temporal logic rewards
  • Xiao Li, C. Vasile, C. Belta
  • Computer Science
    2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2017
TLDR
It is shown in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied.
Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks
TLDR
This paper presents the first reward shaping framework for average-reward learning and proves that, under standard assumptions, the optimal policy under the original reward function can be recovered.
LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning
TLDR
This work proposes using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions, to ease the burden of complex reward function specification.
Mungojerrie: Reinforcement Learning of Linear-Time Objectives
TLDR
Mungojerrie is a tool for testing reward schemes for Ο‰-regular objectives on finite models and contains reinforcement learning algorithms and a probabilistic model checker.
...
1
2
3
4
5
...