• Corpus ID: 232233726

Solving Compositional Reinforcement Learning Problems via Task Reduction

@article{Li2021SolvingCR,
  title={Solving Compositional Reinforcement Learning Problems via Task Reduction},
  author={Yunfei Li and Yilin Wu and Huazhe Xu and Xiaolong Wang and Yi Wu},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.07607}
}
We propose a novel learning paradigm, Self-Imitation via Reduction (SIR), for solving compositional reinforcement learning problems. SIR is based on two core ideas: task reduction and self-imitation. Task reduction tackles a hard-to-solve task by actively reducing it to an easier task whose solution is known by the RL agent. Once the original hard task is successfully solved by task reduction, the agent naturally obtains a self-generated solution trajectory to imitate. By continuously… 
Environment Generation for Zero-Shot Compositional Reinforcement Learning
TLDR
This work presents Compositional Design of Environments (CoDE), which trains a Generator agent to automatically build a series of compositional tasks tailored to the RL agent’s current skill level, and learns to generate environments composed of multiple pages or rooms, and trains RL agents capable of completing wide-range of complex tasks in those environments.
Multi-Task Learning with Sequence-Conditioned Transporter Networks
TLDR
This work proposes a new suite of benchmark specifically aimed at compositional tasks, MultiRavens, and proposes a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-conditioning and weighted sampling and can efficiently learn to solve multi-task long horizon problems.
Disentangled Attention as Intrinsic Regularization for Bimanual Multi-Object Manipulation
TLDR
Experimental results show that the proposed intrinsic regularization successfully avoids domination and reduces conflicts for the policies, which leads to significantly more effective cooperative strategies than all the baselines.
DAIR: Disentangled Attention Intrinsic Regularization for Safe and Efficient Bimanual Manipulation
TLDR
Experimental results show that the proposed intrinsic regularization successfully avoids domination and reduces conflicts for the policies, which leads to significantly more efficient and safer cooperative strategies than all the baselines.
A Simple Approach to Continual Learning by Transferring Skill Parameters
TLDR
It is shown how to continually acquire robotic manipulation skills without forgetting, and using far fewer samples than needed to train them from scratch, given an appropriate curriculum.
Towards a Framework for Comparing the Complexity of Robotic Tasks
TLDR
A notion of reduction is formalized that formalizes the following intuition: Task 1 reduces to Task 2 if the authors can efficiently transform any policy that solves Task 2 into a policy that solve Task 1.
Comparing the Complexity of Robotic Tasks
TLDR
A notion of reduction is defined that formalizes the following intuition: Task 1 reduces to Task 2 if the authors can efficiently transform any policy that solves Task 2 into a policy that solve Task 1.
Learning to Design and Construct Bridge without Blueprint
TLDR
A bi-level robot system that learns a bridge blueprint policy in a physical simulator using deep reinforcement learning and curriculum learning and implements a motion-planning-based policy for real-robot motion control, which can be directly combined with a trained blueprint policy forreal-world bridge construction without tuning.

References

SHOWING 1-10 OF 80 REFERENCES
Learning by Playing - Solving Sparse Reward Tasks from Scratch
TLDR
The key idea behind the method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment - enabling it to excel at sparse reward RL.
Exploration via Hindsight Goal Generation
TLDR
HGG is introduced, a novel algorithmic framework that generates valuable hindsight goals which are easy for an agent to achieve in the short term and are also potential for guiding the agent to reach the actual goal in the long term.
Efficient Exploration with Self-Imitation Learning via Trajectory-Conditioned Policy
This paper proposes a method for learning a trajectory-conditioned policy to imitate diverse demonstrations from the agent’s own past experiences. We demonstrate that such self-imitation drives
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
TLDR
h-DQN is presented, a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, and allows for flexible goal specifications, such as functions over entities and relations.
Language as an Abstraction for Hierarchical Deep Reinforcement Learning
TLDR
This paper introduces an open-source object interaction environment built using the MuJoCo physics engine and the CLEVR engine and finds that, using the approach, agents can learn to solve to diverse, temporally-extended tasks such as object sorting and multi-object rearrangement, including from raw pixel observations.
Modular Multitask Reinforcement Learning with Policy Sketches
TLDR
Experiments show that using the approach to learn policies guided by sketches gives better performance than existing techniques for learning task-specific or shared policies, while naturally inducing a library of interpretable primitive behaviors that can be recombined to rapidly adapt to new tasks.
Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning
TLDR
It is shown that graph-based relational architectures overcome this limitation and enable learning of complex tasks when provided with a simple curriculum of tasks with increasing numbers of objects, and exhibits zero-shot generalization.
Composable Deep Reinforcement Learning for Robotic Manipulation
TLDR
This paper shows that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies.
DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
TLDR
A new multi-stage RL agent, DARLA (DisentAngled Representation Learning Agent), which learns to see before learning to act, which significantly outperforms conventional baselines in zero-shot domain adaptation scenarios.
Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning
TLDR
An open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks is proposed to make it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks.
...
...