Differentiable Algorithm Networks for Composable Robot Learning

@article{Karkus2019DifferentiableAN,
  title={Differentiable Algorithm Networks for Composable Robot Learning},
  author={Peter Karkus and Xiao Ma and David Hsu and Leslie Pack Kaelbling and Wee Sun Lee and Tomas Lozano-Perez},
  journal={ArXiv},
  year={2019},
  volume={abs/1905.11602}
}
This paper introduces the Differentiable Algorithm Network (DAN), a composable architecture for robot learning systems. A DAN is composed of neural network modules, each encoding a differentiable robot algorithm and an associated model; and it is trained end-to-end from data. DAN combines the strengths of model-driven modular system design and data-driven end-to-end learning. The algorithms and models act as structural assumptions to reduce the data requirements for learning; end-to-end… 

Figures and Tables from this paper

Locally-Connected Interrelated Network: A Forward Propagation Primitive

Simulation tests on benchmark problems involving 2D and 3D navigation and grasping indicate promising results: Changing only the forward propagation module alone with LCI-Net improves VIN’s and QMDP-Net generalization capability by more than 3 × and 10 ×, respectively.

Robotic Manipulation Skill Acquisition Via Demonstration Policy Learning

A learning-by-imitation approach that learns demonstration policy for robotic manipulation skill acquisition from what-where-how interaction data can improve the robotic adaptability to the environment and tasks with fewer training inputs.

Reinforcement Learning based Multi-Robot Classification via Scalable Communication Structure

It is shown that the proposed architecture achieves a comparable classification accuracy with the centralized methods, maintains high performance with various numbers of robots without additional training cost, and robust to hacking and loss of the robots in the network.

Learning Task-Driven Control Policies via Information Bottlenecks

A reinforcement learning approach to synthesizing task-driven control policies for robotic systems equipped with rich sensory modalities by deriving a policy gradient-style algorithm that constrains actions to only depend on task-relevant information.

Beyond Tabula-Rasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban

This work explores whether integrated tasks like Mujoban can be solved by composing RL modules together in a sense-plan-act hierarchy, where modules have well-defined roles similarly to classic robot architectures, and finds that the modular RL approach dramatically outperforms the state-of-the-art monolithic RL agent on Mu Joban.

From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence

Contrary to viewing embodied intelligence as another application domain for machine learning, here it is argued that it is in fact a key driver for the advancement of machine learning technology.

How to Train Your Differentiable Filter

This work implements DFs with four different underlying filtering algorithms and finds that long enough training sequences are crucial for DF performance and that modelling heteroscedastic observation noise significantly improves results.

Neuro-algorithmic Policies enable Fast Combinatorial Generalization

A neuro-algorithmic policy architecture consisting of a neural network and an embedded time-dependant shortest path solver is introduced that generalizes well to unseen variations in the environment already and can be trained end-to-end by blackbox differentiation.

Model-Augmented Actor-Critic: Backpropagating through Paths

This paper builds a policy optimization algorithm that uses the pathwise derivative of the learned model and policy across future timesteps, and matches the asymptotic performance of model-free algorithms, and scales to long horizons, a regime where typically past model-based approaches have struggled.

References

SHOWING 1-10 OF 53 REFERENCES

Universal Planning Networks

This work finds that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images.

QMDP-Net: Deep Learning for Planning under Partial Observability

While QMDP-net encodes theQMDP algorithm, it sometimes outperforms the QM DP algorithm in the experiments, as a result of end-to-end learning.

Path Integral Networks: End-to-End Differentiable Optimal Control

Preliminary experiment results show that PI-Net, trained by imitation learning, can mimic control demonstrations for two simulated problems; a linear system and a pendulum swing-up problem.

Goal-driven dynamics learning via Bayesian optimization

This work uses Bayesian optimization in an active learning framework where a locally linear dynamics model is learned with the intent of maximizing the control performance, and used in conjunction with optimal control schemes to efficiently design a controller for a given task.

Value Iteration Networks

This work introduces the value iteration network (VIN), a fully differentiable neural network with a `planning module' embedded within that shows that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.

Learning to Navigate in Complex Environments

This work considers jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks and shows that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs.

Particle Filter Networks with Application to Visual Localization

The Particle Filter Network is introduced, which encodes both a system model and a particle filter algorithm in a single neural network, which consistently outperforms alternative learning architectures, as well as a traditional model-based method, under a variety of sensor inputs.

Differentiable MPC for End-to-end Planning and Control

The foundations for using Model Predictive Control as a differentiable policy class for reinforcement learning in continuous state and action spaces are presented and it is shown that the MPC policies are significantly more data-efficient than a generic neural network and that the method is superior to traditional system identification in a setting where the expert is unrealizable.

Real-World Reinforcement Learning via Multifidelity Simulators

The framework is designed to limit the number of samples used in each successively higher-fidelity/cost simulator by allowing a learning agent to choose to run trajectories at the lowest level simulator that will still provide it with useful information.

Augmenting Physical Simulators with Stochastic Neural Networks: Case Study of Planar Pushing and Bouncing

This paper builds an efficient, generalizable physical simulator with universal uncertainty estimates for two scenarios, planar pushing and ball bouncing, by augmenting an analytical rigid-body simulator with a neural network that learns to model uncertainty as residuals.
...