• Corpus ID: 239050438

Hierarchical Skills for Efficient Exploration

@inproceedings{Gehring2021HierarchicalSF,
  title={Hierarchical Skills for Efficient Exploration},
  author={Jonas Gehring and Gabriel Synnaeve and Andreas Krause and Nicolas Usunier},
  booktitle={Neural Information Processing Systems},
  year={2021}
}
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. However, prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design. In previous work on continuous control, the sensitivity of methods to this trade-off has not been addressed explicitly, as locomotion provides a suitable prior for navigation tasks, which have been of… 

Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

This work proposes accelerating exploration in the skill space using state-conditioned generative models to directly bias the high-level agent towards only sampling skills relevant to a given state based on prior experience, and proposes a low-level residual policy for fine-grained skill adaptation enabling downstream RL agents to adapt to unseen task variations.

TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement Learning

Empirical evidence is provided that TempoRL can leverage task-agnostic trajectories to accelerate learning and introduces state-independent temporal priors, which directly model temporal consistency in demonstrated trajectories, and are capable of driving exploration in complex tasks, even when trained on data collected on simpler tasks.

SFP: State-free Priors for Exploration in Off-Policy Reinforcement Learning

This work introduces state-free priors, which directly model temporal consistency in demonstrated trajectories, and are capable of driving exploration in complex tasks, even when trained on data collected on simpler tasks, and introduces a novel integration scheme for action priors in off-policy reinforcement learning by dynamically sampling actions from a probabilistic mixture of policy and action prior.

Pretraining in Deep Reinforcement Learning: A Survey

This survey seeks to systematically review existing works in pretraining for deep reinforcement learning, provide a taxonomy of these methods, discuss each sub-field, and bring attention to open problems and future directions.

Deep Hierarchical Planning from Pixels

Director is introduced, a practical method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model, and the decisions are interpretable because the world model can decode goals into images for visualization.

Hierarchical quality-diversity for online damage recovery

The Hierarchical Trial and Error algorithm is introduced, which uses a hierarchical behavioural repertoire to learn diverse skills and leverages them to make the robot more adaptive to different situations and shows that the hierarchical decomposition of skills enables the robot to learn more complex behaviours while keeping the learning of the repertoire tractable.

Online Damage Recovery for Physical Robots with Hierarchical Quality-Diversity

The Hierarchical Trial and Error algorithm, which uses a hierarchical behavioural repertoire to learn diverse skills and leverages them to make the robot adapt quickly in the physical world, and shows that the hierarchical decomposition of skills enables the robot to learn more complex behaviours while keeping the learning of the repertoire tractable.

Hierarchical Strategies for Cooperative Multi-Agent Reinforcement Learning

A two-level hierarchical architecture that combines a novel information-theoretic objective with a trajectory prediction model to learn a “strategy” and encourages each agent to behave according to the strategy by conditioning its local Q -functions on z A and z R, thus outperforming all existing methods.

Leveraging Demonstrations with Latent Space Priors

This work proposes to leverage demonstration datasets by combining skill learning and sequence modeling and shows how to acquire latent space priors from state-only motion capture demonstrations and explores several methods for integrating them into policy learning on transfer tasks.

Choreographer: Learning and Adapting Skills in Imagination

This work presents Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination, and decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model.

References

SHOWING 1-10 OF 54 REFERENCES

Sub-policy Adaptation for Hierarchical Reinforcement Learning

A novel algorithm to discover a set of skills, and continuously adapt them along with the higher level even when training on a new task, and introduces Hierarchical Proximal Policy Optimization (HiPPO), an on-policy method to efficiently train all levels of the hierarchy jointly.

Data-Efficient Hierarchical Reinforcement Learning

This paper studies how to develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control.

Accelerating Reinforcement Learning with Learned Skill Priors

This work proposes a deep latent variable model that jointly learns an embedding space of skills and the skill prior from offline agent experience, and extends common maximum-entropy RL approaches to use skill priors to guide downstream learning.

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

This work isolates and evaluates the claimed benefits of hierarchical RL on a suite of tasks encompassing locomotion, navigation, and manipulation and finds that most of the observed benefits of hierarchy can be attributed to improved exploration, as opposed to easier policy learning or imposed hierarchical structures.

Diversity is All You Need: Learning Skills without a Reward Function

The proposed DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function, learns skills by maximizing an information theoretic objective using a maximum entropy policy.

Hierarchical Reinforcement Learning By Discovering Intrinsic Options

The effectiveness of HIDIO is demonstrated compared against other reinforcement learning methods in achieving high rewards with better sample efficiency across a variety of robotic navigation and manipulation tasks.

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

Results on a number of difficult continuous-control tasks show that the developed notion of sub-optimality of a representation, defined in terms of expected reward of the optimal hierarchical policy using this representation, yields qualitatively better representations as well as quantitatively better hierarchical policies compared to existing methods.

Planning in Learned Latent Action Spaces for Generalizable Legged Locomotion

This letter presents a fully-learned hierarchical framework, that is capable of jointly learning the low-level controller and the high-level latent action space, and shows that this framework outperforms baselines on multiple tasks and two simulations.

Emergence of Locomotion Behaviours in Rich Environments

This paper explores how a rich environment can help to promote the learning of complex behavior, and finds that this encourages the emergence of robust behaviours that perform well across a suite of tasks.
...