Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

  title={Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning},
  author={Carlos Florensa and Jonathan Tremblay and Nathan D. Ratliff and Animesh Garg and Fabio Ramos and Dieter Fox},
  journal={2020 IEEE International Conference on Robotics and Automation (ICRA)},
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sampleinefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods… 

Figures and Tables from this paper

Learning Skills to Patch Plans Based on Inaccurate Models

This paper proposes a method that improves the efficiency of sub-optimal planners with approximate but simple and fast models by switching to a model-free policy when unexpected transitions are observed by patching with a local policy only where needed.

CMAX++ : Leveraging Experience in Planning and Execution using Inaccurate Models

CMAX++ is proposed, an approach that leverages real-world experience to improve the quality of resulting plans over successive repetitions of a robotic task and achieves this by integrating model-free learning using acquired experience with model-based planning using the potentially inaccurate model.

Affordance Learning from Play for Sample-Efficient Policy Learning

This work proposes a novel approach that extracts a self-supervised visual affordance model from human teleoperated play data and leverages it to enable efficient policy learning and motion planning, and combines model-based planning with model-free deep reinforcement learning (RL) to learn policies that favor the same object regions favored by people, while requiring minimal robot interactions with the environment.

Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics

Bayesian Controller Fusion is a promising approach for combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently.

Residual Learning from Demonstration: Adapting DMPs for Contact-rich Manipulation

This work proposes “residual Learning from Demonstration” (rLfD), a framework that combines DMPs with Reinforcement Learning (RL) to learn a residual correction policy and suggests that applying residual learning directly in task space and operating on the full pose of the robot can significantly improve the overall performance of D MPs.

Motion Planner Augmented Action Spaces for Reinforcement Learning

This work proposes to combine the benefits of both approaches by formulating a novel action space for continuous robotic control tasks that equips RL agents with long-horizon planning capabilities and trains modelfree RL agents that learn to decide when to make use of the motion planner purely from reward signals.

Residual Robot Learning for Object-Centric Probabilistic Movement Primitives

This work combines ProMPs with recently introduced Residual Reinforcement Learning (RRL), to account for both, corrections in position and orientation during task execution, and learns a residual on top of a nominal ProMP trajectory with Soft-Actor Critic to reduce the search space for RRL.

Proactive Action Visual Residual Reinforcement Learning for Contact-Rich Tasks Using a Torque-Controlled Robot

This paper considers incorporating operational space visual and haptic information into a reinforcement learning (RL) method to solve the target uncertainty problems in unstructured environments and proposes a novel idea of introducing a proactive action to solve a partially observable Markov decision process (POMDP) problem.

Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty

The testing results show that the proposed Dense2Sparse method is capable of getting a higher expected reward and success rate compared with the ones using standalone dense or sparse reward, and it also has a superior tolerance for system uncertainty.

Combining Learning from Demonstration with Learning by Exploration to Facilitate Contact-Rich Tasks

This study focuses on combining visual servoing-based learning from demonstration (LfD) and force- based learning by exploration (LbE) to enable the fast and intuitive programming of contact-rich tasks with minimal user efforts.



Residual Policy Learning

It is argued that RPL is a promising approach for combining the complementary strengths of deep reinforcement learning and robotic control, pushing the boundaries of what either can achieve independently.

Goal-conditioned Imitation Learning

Different approaches to incorporate demonstrations to drastically speed up the convergence to a policy able to reach any goal, also surpassing the performance of an agent trained with other Imitation Learning algorithms are investigated.

Composable Deep Reinforcement Learning for Robotic Manipulation

This paper shows that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies.

Overcoming Exploration in Reinforcement Learning with Demonstrations

This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.

End-to-End Training of Deep Visuomotor Policies

This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.

Benchmarking Model-Based Reinforcement Learning

This paper gathers a wide collection of MBRL algorithms and proposes over 18 benchmarking environments specially designed for MBRL, and describes three key research challenges for future MBRL research: the dynamics bottleneck, the planning horizon dilemma, and the early-termination dilemma.

Visual Reinforcement Learning with Imagined Goals

An algorithm is proposed that acquires general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies, efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques.

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

It is demonstrated that neural network dynamics models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits that accomplish various complex locomotion tasks.

Reverse Curriculum Generation for Reinforcement Learning

This work proposes a method to learn goal-oriented tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved, and generates a curriculum of start states that adapts to the agent's performance, leading to efficient training on goal- oriented tasks.