A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies

@article{Bharadhwaj2019ADF,
  title={A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies},
  author={Homanga Bharadhwaj and Zihan Wang and Yoshua Bengio and Liam Paull},
  journal={2019 International Conference on Robotics and Automation (ICRA)},
  year={2019},
  pages={782-788}
}
Learning effective visuomotor policies for robots purely from data is challenging, but also appealing since a learning-based system should not require manual tuning or calibration. In the case of a robot operating in a real environment the training process can be costly, time-consuming, and even dangerous since failures are common at the start of training. For this reason, it is desirable to be able to leverage simulation and off-policy data to the extent possible to train the robot. In this… 

Figures and Tables from this paper

DeepRacer: Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning
TLDR
This work demonstrates how a 1/18th scale car can learn to drive autonomously using RL with a monocular camera using DeepRacer, the first successful large-scale deployment of deep reinforcement learning on a robotic control agent that uses only raw camera images as observations and a model-free learning method to perform robust path planning.
DeepRacer: Educational Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning
TLDR
This work demonstrates how a 1/18th scale car can learn to drive autonomously using RL with a monocular camera and is the first successful large-scale deployment of deep reinforcement learning on a robotic control agent that uses only raw camera images as observations and a model-free learning method to perform robust path planning.
Using Reinforcement Learning and Simulation to Develop Autonomous Vehicle Control Strategies
TLDR
This research demonstrates the ability to learn new control strategies while in a simulated environment without the need for large amounts of real-world data and results in a low cost, low data solution that enables control of a full-sized, self-driving passenger vehicle.
Learning from Imperfect Demonstrations via Adversarial Confidence Transfer
TLDR
This work relies on demonstrations along with their confidence values from a different correspondent environment to learn a confidence predictor for the environment they aim to learnA policy in (target environment— where they only have unlabeled demonstrations) and learns a common latent space through adversarial distribution matching of multi-length partial trajectories.
MANGA: Method Agnostic Neural-policy Generalization and Adaptation
TLDR
This work introduces MANGA: Method Agnostic Neural-policy Generalization and Adaptation, that trains dynamics conditioned policies and efficiently learns to estimate the dynamics parameters of the environment given off-policy state-transition rollouts in the environment.
Dynamics-Aware Latent Space Reachability for Exploration in Temporally-Extended Tasks
TLDR
The proposed self-supervised exploration algorithm, which learns a dynamics-aware manifold of reachable states, can achieve 20% superior performance on average compared to existing baselines on a set of challenging robotic environments, including on a real robot manipulation task.
LEAF: Latent Exploration Along the Frontier
TLDR
An exploration framework, which learns a dynamics-aware manifold of reachable states, and incorporates a curriculum for sampling easier goals before more difficult goals, demonstrating that the proposed self-supervised exploration algorithm, superior performance compared to existing baselines on a set of challenging robotic environments.
Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks
TLDR
This paper studies how to use meta-reinforcement learning to solve the bulk of the problem in simulation by solving a family of simulated industrial insertion tasks and then adapt policies quickly in the real world.
Learning to Navigate from Simulation via Spatial and Semantic Information Synthesis
TLDR
A visual information pyramid (VIP) model is proposed to systematically investigate a practical environment representation and it is suggested that this representation behaves best in both simulated and real-world scenarios.
On Assessing the Usefulness of Proxy Domains for Developing and Evaluating Embodied Agents
TLDR
This paper attempts to clarify the role of proxy domains and establish new proxy usefulness (PU) metrics to compare the usefulness of different proxy domains, and proposes the relative predictive PU to assess the predictive ability of a proxy domain and the learning PU to quantify the usefulnessof a proxy as a tool to generate learning data.
...
...

References

SHOWING 1-10 OF 52 REFERENCES
Sim-to-Real Robot Learning from Pixels with Progressive Nets
TLDR
This work proposes using progressive networks to bridge the reality gap and transfer learned policies from simulation to the real world, and presents an early demonstration of this approach with a number of experiments in the domain of robot manipulation that focus on bridging thereality gap.
Asymmetric Actor Critic for Image-Based Robot Learning
TLDR
This work exploits the full state observability in the simulator to train better policies which take as input only partial observations (RGBD images) and combines this method with domain randomization and shows real robot experiments for several tasks like picking, pushing, and moving a block.
Adversarial discriminative sim-to-real transfer of visuo-motor policies
TLDR
An adversarial discriminative sim-to-real transfer approach to reduce the amount of labeled real data required in visuo-motor policies transferred to real environments, achieving a 97.8% success rate and 1.8 cm control accuracy.
Deep visual foresight for planning robot motion
  • Chelsea Finn, S. Levine
  • Computer Science
    2017 IEEE International Conference on Robotics and Automation (ICRA)
  • 2017
TLDR
This work develops a method for combining deep action-conditioned video prediction models with model-predictive control that uses entirely unlabeled training data and enables a real robot to perform nonprehensile manipulation — pushing objects — and can handle novel objects not seen during training.
3D Simulation for Robot Arm Control with Deep Q-Learning
TLDR
This work presents an approach which uses 3D simulations to train a 7-DOF robotic arm in a control task without any prior knowledge, and presents preliminary results in direct transfer of policies over to a real robot, without any further training.
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
TLDR
By randomizing the dynamics of the simulator during training, this paper is able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained.
(CAD)$^2$RL: Real Single-Image Flight without a Single Real Image
TLDR
This paper proposes a learning method that they call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models, and shows that it can train a policy that generalizes to thereal world, without requiring the simulator to be particularly realistic or high-fidelity.
Universal Planning Networks
TLDR
This work finds that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images.
Domain Randomization and Generative Models for Robotic Grasping
TLDR
A novel data generation pipeline for training a deep neural network to perform grasp planning that applies the idea of domain randomization to object synthesis and can achieve a >90% success rate on previously unseen realistic objects at test time in simulation despite having only been trained on random objects.
Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments
TLDR
This work proposes a novel domain adaptation approach for robot perception that adapts visual representations learned on a large easy-to-obtain source dataset to a target real-world domain, without requiring expensive manual data annotation of real world data before policy search.
...
...