EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL

  title={EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL},
  author={Thomas Carta and Sylvain Lamprier and Pierre-Yves Oudeyer and Olivier Sigaud},
Reinforcement learning (RL) in long horizon and sparse reward tasks is notoriously difficult and requires a lot of training steps. A standard solution to speed up the process is to leverage additional reward signals, shaping it to better guide the learning process. In the context of language-conditioned RL, the abstraction and generalisation properties of the language input provide opportunities for more efficient ways of shaping the reward. In this paper, we leverage this idea and propose an… 



Using Natural Language for Reward Shaping in Reinforcement Learning

This work proposes the LanguagE-Action Reward Network (LEARN), a framework that maps free-form natural language instructions to intermediate rewards based on actions taken by the agent that can seamlessly be integrated into any standard reinforcement learning algorithm.

Learning to Understand Goal Specifications by Modelling Reward

A framework within which instruction-conditional RL agents are trained using rewards obtained not from the environment, but from reward models which are jointly trained from expert examples, which allows an agent to adapt to changes in the environment without requiring new expert examples.

Improving Intrinsic Exploration with Language Abstractions

This work evaluates whether language can improve over existing exploration methods by directly extending (and comparing to) competitive intrinsic exploration baselines: AMIGo (Campero et al, 2021) and NovelD (Zhang et al., 2021).

Modular Multitask Reinforcement Learning with Policy Sketches

Experiments show that using the approach to learn policies guided by sketches gives better performance than existing techniques for learning task-specific or shared policies, while naturally inducing a library of interpretable primitive behaviors that can be recombined to rapidly adapt to new tasks.

Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey

A typology of methods where deep RL algorithms are trained to tackle the developmental robotics problem of the autonomous acquisition of open-ended repertoires of skills is proposed at the intersection of deep RL and developmental approaches.

Grounding Language to Autonomously-Acquired Skills via Goal Generation

This work proposes a new conceptual approach to language-conditioned RL: the Language-Goal-Behavior architecture (LGB), which decouples skill learning and language grounding via an intermediate semantic representation of the world.

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

This paper introduces an open-source object interaction environment built using the MuJoCo physics engine and the CLEVR engine and finds that, using the approach, agents can learn to solve to diverse, temporally-extended tasks such as object sorting and multi-object rearrangement, including from raw pixel observations.

Asking for Knowledge: Training RL Agents to Query External Knowledge Using Language

The AFK agent is proposed, which learns to generate language commands to query for meaningful knowledge that helps solve the tasks and outperforms recent baselines on the challenging Q-BabyAI and Q-TextWorld environments.

A Survey of Reinforcement Learning Informed by Natural Language

The time is right to investigate a tight integration of natural language understanding into Reinforcement Learning in particular, and the state of the field is surveyed, including work on instruction following, text games, and learning from textual domain knowledge.

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

Conditions under which modi cations to the reward function of a Markov decision process preserve the op timal policy are investigated to shed light on the practice of reward shap ing a method used in reinforcement learn ing whereby additional training rewards are used to guide the learning agent.