• Corpus ID: 3619097

Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control

  title={Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control},
  author={Glen Berseth and Kevin Xie and Paul Cernek and Michiel van de Panne},
Deep reinforcement learning has demonstrated increasing capabilities for continuous control problems, including agents that can move with skill and agility through their environment. [] Key Method We extend policy distillation methods to the continuous action setting and leverage this technique to combine expert policies, as evaluated in the domain of simulated bipedal locomotion across different classes of terrain.

Figures and Tables from this paper

Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real

An iterative design approach is described and document, which reflects the multiple design iterations of the reward that are often (if not always) needed in practice in practice, which demonstrates the transfer of policies learned in simulation to the physical robot without dynamics randomization.

Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie

This paper proposes a practical method that allows the reward function to be fully redefined on each successive design iteration while limiting the deviation from the previous iteration, and demonstrates the effectiveness of this iterative-design approach on the bipedal robot Cassie.

Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning

This paper presents an approach based on model-free Deep Reinforcement Learning to control recovery maneuvers of quadrupedal robots using a hierarchical behavior-based controller that manifests dynamic and reactive recovery behaviors to recover from an arbitrary fall configuration within less than 5 seconds.

Self-Imitation Learning of Locomotion Movements through Termination Curriculum

A novel combination of techniques for accelerating the learning of stable locomotion movements through self-imitation learning of synthetic animations using a novel curriculum learning approach called Termination Curriculum (TC), that adapts the episode termination threshold over time.

A New Framework for Multi-Agent Reinforcement Learning - Centralized Training and Exploration with Decentralized Execution via Policy Distillation

A new framework known as centralized training and exploration with decentralized execution via policy distillation is proposed, guided by this framework and the maximum-entropy learning technique, which can achieve significantly better performance and higher sample efficiency than a cutting-edge baseline on several multi-agent DRL benchmarks.

Learning to Walk via Deep Reinforcement Learning

A sample-efficient deep RL algorithm based on maximum entropy RL that requires minimal per-task tuning and only a modest number of trials to learn neural network policies is proposed and achieves state-of-the-art performance on simulated benchmarks with a single set of hyperparameters.

Kickstarting Deep Reinforcement Learning

It is shown that, on a challenging and computationally-intensive multi-task benchmark (DMLab-30), kickstarted training improves the data efficiency of new agents, making it significantly easier to iterate on their design.

Learning to Walk in the Real World with Minimal Human Effort

This paper develops a system for learning legged locomotion policies with deep RL in the real world with minimal human effort by developing a multi-task learning procedure, an automatic reset controller, and a safety-constrained RL framework.

Continual Model-Based Reinforcement Learning with Hypernetworks

HyperCRL, a method that continually learns the encountered dynamics in a sequence of tasks using task-conditional hypernetworks, outperforms existing continual learning alternatives that rely on fixed-capacity networks, and does competitively with baselines that remember an ever increasing coreset of past experience.

GLiDE: Generalizable Quadrupedal Locomotion in Diverse Environments with a Centroidal Model

This work explores how RL can be effectively used with a centroidal model to generate robust control policies for quadrupedal locomotion and shows the potential of the method by demonstrating stepping-stone locomotion, twolegged in-place balance, balance beam locomotion; and sim-toreal transfer without further adaptations.



DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning

This paper aims to learn a variety of environment-aware locomotion skills with a limited amount of prior knowledge by adopting a two-level hierarchical control framework and training both levels using deep reinforcement learning.

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

This work defines a novel method of multitask and transfer learning that enables an autonomous agent to learn how to behave in multiple tasks simultaneously, and then generalize its knowledge to new domains, and uses Atari games as a testing environment to demonstrate these methods.

Learning locomotion skills using DeepRL: does the choice of action space matter?

It is demonstrated that the local feedback provided by higher-level action parameterizations can significantly impact the learning, robustness, and motion quality of the resulting policies.

Policy Distillation

A novel method called policy distillation is presented that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient.

The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)

The promise of ALE is illustrated by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning, and an evaluation methodology made possible by ALE is proposed.

Learning human behaviors from motion capture by adversarial imitation

Generative adversarial imitation learning is extended to enable training of generic neural network policies to produce humanlike movement patterns from limited demonstrations consisting only of partially observed state features, without access to actions, even when the demonstrations come from a body with different and unknown physical parameters.

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

h-DQN is presented, a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, and allows for flexible goal specifications, such as functions over entities and relations.

Distral: Robust multitask reinforcement learning

This work proposes a new approach for joint training of multiple tasks, which it refers to as Distral (Distill & transfer learning), and shows that the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning.

Learning and Transfer of Modulated Locomotor Controllers

A novel architecture and training procedure for locomotion tasks where a monolithic end-to-end architecture fails completely, learning with a pre-trained spinal module succeeds at multiple high-level tasks, and enables the effective exploration required to learn from sparse rewards.

A Deep Hierarchical Approach to Lifelong Learning in Minecraft

We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledge-base. Knowledge is