• Corpus ID: 51867872

Teaching Multiple Tasks to an RL Agent using LTL

  title={Teaching Multiple Tasks to an RL Agent using LTL},
  author={Rodrigo Toro Icarte and Toryn Q. Klassen and Richard Anthony Valenzano and Sheila A. McIlraith},
  booktitle={Adaptive Agents and Multi-Agent Systems},
This paper examines the problem of how to teach multiple tasks to a Reinforcement Learning (RL) agent. To this end, we use Linear Temporal Logic (LTL) as a language for specifying multiple tasks in a manner that supports the composition of learned skills. We also propose a novel algorithm that exploits LTL progression and off-policy RL to speed up learning without compromising convergence guarantees, and show that our method outperforms the state-of-the-art approach on randomly generated… 

Figures from this paper

LTL2Action: Generalizing LTL Instructions for Multi-Task RL

This work addresses the problem of teaching a deep reinforcement learning (RL) agent to follow instructions in multi-task environments by introducing an environment-agnostic LTL pretraining scheme which improves sample-efficiency in downstream environments and exploits the compositional syntax and semantics of LTL.

Advising and Instructing Reinforcement Learning Agents with LTL and Automata

Introduction Reinforcement learning (RL) involves an agent learning through interaction with an environment how to behave so as to maximize the expected cumulative reward (Sutton and Barto 1998).

From STL Rulebooks to Rewards

A principled approach to shaping rewards for reinforcement learning from multiple objectives that are given as a partially-ordered set of signal-temporallogic (STL) rules is proposed.

Modular Deep Reinforcement Learning with Temporal Logic Specifications

We propose an actor-critic, model-free, and online Reinforcement Learning (RL) framework for continuous-state continuous-action Markov Decision Processes (MDPs) when the reward is highly sparse but

Extended Markov Games to Learn Multiple Tasks in Multi-Agent Reinforcement Learning

This paper formally defines Extended Markov Games as a general mathematical model that allows multiple RL agents to concurrently learn various non-Markovian specifications and uses this model to train two different logic-based multi-agent RL algorithms to solve diverse settings of non- Markovian co-safe LTL specifications.

Evolutionary reinforcement learning for sparse rewards

GEATL is presented, the first hybrid on-policy evolutionary-based algorithm that combines the advantages of gradient learning in deep RL with the exploration ability of evolutionary algorithms, in order to solve the sparse reward problem pertaining to TL specifications.

Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning

We present a reinforcement learning (RL) frame-work to synthesize a control policy from a given linear temporal logic (LTL) specification in an unknown stochastic environment that can be modeled as a

Generalizing LTL Instructions via Future Dependent Options

A novel multi-task RL algorithm with improved learning efficiency and optimality to achieve the global optimality of task completion and the LTL generalization capability of the agent trained by the proposed method is evaluated.

Deep Reinforcement Learning with Temporal Logics

This work proposes a deep Reinforcement Learning method for policy synthesis in continuous-state/action unknown environments, under requirements expressed in Linear Temporal Logic (LTL), and shows that this combination lifts the applicability of deep RL to complex temporal and memory-dependent policy synthesis goals.

Symbolic Plans as High-Level Instructions for Reinforcement Learning

An empirical evaluation shows that the use of high-level symbolic action models as a framework for defining final-state goal tasks and automatically producing their corresponding reward functions converges to near-optimal solutions faster than standard RL and HRL methods.



Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

This work defines a novel method of multitask and transfer learning that enables an autonomous agent to learn how to behave in multiple tasks simultaneously, and then generalize its knowledge to new domains, and uses Atari games as a testing environment to demonstrate these methods.

Transfer from Multiple MDPs

The theoretical properties of this transfer method are investigated and novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks are introduced.

Reinforcement Learning with Hierarchies of Machines

This work presents provably convergent algorithms for problem-solving and learning with hierarchical machines and demonstrates their effectiveness on a problem with several thousand states.

Environment-Independent Task Specifications via GLTL

A new task-specification language for Markov decision processes that is designed to be an improvement over reward functions by being environment independent and extended to probabilistic specifications in a way that permits approximations to be learned in finite time.

Integrating Human Demonstration and Reinforcement Learning : Initial Results in Human-Agent Transfer

This work introduces Human-Agent Transfer (HAT), a method that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in

Reinforcement Learning with a Hierarchy of Abstract Models

Simulations on a set of compositionally-structured navigation tasks show that H-DYNA can learn to solve them faster than conventional RL algorithms, and the abstract models can be used to solve stochastic control tasks.

Reinforcement learning with temporal logic rewards

  • Xiao LiC. VasileC. Belta
  • Computer Science
    2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2017
It is shown in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied.

Multitask reinforcement learning on the distribution of MDPs

  • F. TanakaM. Yamamura
  • Computer Science
    Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694)
  • 2003
The central idea is to introduce an environmental class, BV-MDPs that is defined with the distribution of MDPs with an approach to exploiting past learning experiences that focuses on statistics (mean and deviation) about the agent's value tables.

Hindsight Experience Replay

A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum.

Multi-Task Reinforcement Learning: Shaping and Feature Selection

It is demonstrated that the most intuive one may not always be the best option for the shaping function, and that selecting the right representation results in improved generalization over tasks.