DMAP: a Distributed Morphological Attention Policy for Learning to Locomote with a Changing Body

  title={DMAP: a Distributed Morphological Attention Policy for Learning to Locomote with a Changing Body},
  author={Alberto Silvio Chiappa and Alessandro Marin Vargas and Alexander Mathis},
Biological and artificial agents need to deal with constant changes in the real world. We study this problem in four classical continuous control environments, aug-mented with morphological perturbations. Learning to locomote when the length and the thickness of different body parts vary is challenging, as the control policy is required to adapt to the morphology to successfully balance and advance the agent. We show that a control policy based on the proprioceptive state performs poorly with… 

Figures and Tables from this paper



One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control

It is shown that a single modular policy can successfully generate locomotion behaviors for several planar agents with different skeletal structures such as monopod hoppers, quadrupeds, bipeds, and generalize to variants not seen during training -- a process that would normally require training and manual hyperparameter tuning for each morphology.

AnyMorph: Learning Transferable Polices By Inferring Agent Morphology

This work proposes the first reinforcement learning algorithm that can train a policy to generalize to new agent morphologies without requiring a description of the agent’s morphology in advance, and attains good performance without an explicit description of morphology.

Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity

This paper investigates a modular co-evolution strategy: a collection of primitive agents learns to dynamically self-assemble into composite bodies while also learning to coordinate their behavior to control these bodies.

Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies

Analysis and ablations reveal that both continuous and discrete components are beneficial, and that the learned hierarchical skills are most useful in sparse-reward settings, as they encourage directed exploration of task-relevant parts of the state space.

My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

Motivated by the hypothesis that any benefits GNNs extract from the graph structure are outweighed by difficulties they create for message passing, Amorpheus is proposed, a transformer-based approach that substantially outperforms GNN-based methods.


This work finds that out-of-distribution performance of self-supervised models is correlated to degradation in reward, and trains algorithms on selected RL environments and test transfer performance on perturbed environments.

Smooth Exploration for Robotic Reinforcement Learning

G SDE is evaluated both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car, which allows training directly on the real robots without loss of performance.

Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning

This work uses meta-learning to train a dynamics model prior such that, when combined with recent data, this prior can be rapidly adapted to the local context and demonstrates the importance of incorporating online adaptation into autonomous agents that operate in the real world.

Sub-policy Adaptation for Hierarchical Reinforcement Learning

A novel algorithm to discover a set of skills, and continuously adapt them along with the higher level even when training on a new task, and introduces Hierarchical Proximal Policy Optimization (HiPPO), an on-policy method to efficiently train all levels of the hierarchy jointly.

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

By randomizing the dynamics of the simulator during training, this paper is able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained.