LASER: Learning a Latent Action Space for Efficient Reinforcement Learning

@article{Allshire2021LASERLA,
  title={LASER: Learning a Latent Action Space for Efficient Reinforcement Learning},
  author={Arthur Allshire and Roberto Mart'in-Mart'in and Charles Lin and Shawn Manuel and Silvio Savarese and Animesh Garg},
  journal={2021 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2021},
  pages={6650-6656}
}
The process of learning a manipulation task depends strongly on the action space used for exploration: posed in the incorrect action space, solving a task with reinforcement learning can be drastically inefficient. Additionally, similar tasks or instances of the same task family impose latent manifold constraints on the most effective action space: the task family can be best solved with actions in a manifold of the entire action space of the robot. Combining these insights we present LASER, a… 

Figures from this paper

Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives
TLDR
A simple change to the action interface of the RL algorithm with the robot substantially improves both the learning efficiency and task performance irrespective of the underlying RL algorithm, significantly outperforming prior methods which learn skills from offline expert data.
Augmenting Reinforcement Learning with Behavior Primitives for Diverse Manipulation Tasks
TLDR
This work introduces MAnipulation Primitive-augmented reinforcement LEarning (MAPLE), a learning framework that augments standard reinforcement learning algorithms with a pre-defined library of behavior primitives, robust functional modules specialized in achieving manipulation goals, such as grasping and pushing.
Learning Robotic Manipulation Skills Using an Adaptive Force-Impedance Action Space
TLDR
This work proposes to factor the learning problem in a hierarchical learning and adaption architecture to get the best of both worlds in real-world robotics, and combines these components through a bio-inspired action space that is called AFORCE.
OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation
TLDR
This work proposes OSC for Adaptation and Robustness (OSCAR), a data-driven variant of OSC that compensates for modeling errors by inferring relevant dynamics parameters from online trajectories and enables robust zero-shot performance under out-of-distribution and rapid adaptation to significant domain shifts through additional finetuning.
Learning to Compose Behavior Primitives for Near-Decomposable Manipulation Tasks
  • Computer Science
  • 2021
TLDR
A reinforcement learning framework in which the agent is equipped with a pre-built library of manipulation primitives that achieve simple yet versatile behaviors that can solve tasks substantially more effecient than existing approaches.
A Simple Approach to Continual Learning by Transferring Skill Parameters
TLDR
It is shown how to continually acquire robotic manipulation skills without forgetting, and using far fewer samples than needed to train them from scratch, given an appropriate curriculum.
GLiDE: Generalizable Quadrupedal Locomotion in Diverse Environments with a Centroidal Model
TLDR
This work explores how RL can be effectively used with a centroidal model to generate robust control policies for quadrupedal locomotion and shows the potential of the method by demonstrating stepping-stone locomotion, twolegged in-place balance, balance beam locomotion; and sim-toreal transfer without further adaptations.

References

SHOWING 1-10 OF 35 REFERENCES
Learning Action Representations for Reinforcement Learning
TLDR
This work provides an algorithm to both learn and use action representations and provide conditions for its convergence and the efficacy of the proposed method is demonstrated on large-scale real-world problems.
A Comparison of Action Spaces for Learning Manipulation Tasks
TLDR
This paper compares learning performance across three tasks, four action spaces, and using two modern reinforcement learning algorithms to lend support to the hypothesis that learning references for a task-space impedance controller significantly reduces the number of samples needed to achieve good performance across all tasks and algorithms.
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
TLDR
h-DQN is presented, a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, and allows for flexible goal specifications, such as functions over entities and relations.
RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning
TLDR
This paper proposes to represent a "fast" reinforcement learning algorithm as a recurrent neural network (RNN) and learn it from data, encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm.
End-to-End Training of Deep Visuomotor Policies
TLDR
This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.
Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation
TLDR
CAVIN Planner is presented, a model-based method that hierarchically generates plans by sampling from latent spaces that decouple the prediction of high-level effects from the generation of low-level motions through cascaded variational inference to facilitate planning over long time horizons.
Plannable Approximations to MDP Homomorphisms: Equivariance under Actions
TLDR
It is proved that when the loss is zero, the optimal policy in the abstract MDP can be successfully lifted to the original MDP, and a contrastive loss function is introduced that enforces action equivariance on the learned representations.
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning
Dynamics-aware Embeddings
TLDR
By combining state and action embeddings, this paper achieves efficient learning of high-quality policies on goal-conditioned continuous control from pixel observations in only 1-2 million environment steps.
Learning Options in Reinforcement Learning
TLDR
This paper empirically explores a simple approach to creating options based on the intuition that states that are frequently visited on system trajectories, could prove to be useful subgoals, and proposes a greedy algorithm for identifying subgoal counts based on state visitation counts.
...
...