Ensuring safety of policies learned by reinforcement: Reaching objects in the presence of obstacles with the iCub

  title={Ensuring safety of policies learned by reinforcement: Reaching objects in the presence of obstacles with the iCub},
  author={Shashank Pathak and Luca Pulina and Giorgio Metta and Armando Tacchella},
  journal={2013 IEEE/RSJ International Conference on Intelligent Robots and Systems},
Given a stochastic policy learned by reinforcement, we wish to ensure that it can be deployed on a robot with demonstrably low probability of unsafe behavior. Our case study is about learning to reach target objects positioned close to obstacles, and ensuring a reasonably low collision probability. Learning is carried out in a simulator to avoid physical damage in the trial-and-error phase. Once a policy is learned, we analyze it with probabilistic model checking tools to identify and correct… 

Figures and Tables from this paper

Verification and repair of control policies for safe reinforcement learning
This work proposes a general-purpose automated methodology to verify risk bounds and repair policies of agents whose policies are learned by reinforcement, and shows that this approach is based on probabilistic model checking algorithms and tools more effective than comparable ones.
Testing a Learn-Verify-Repair Approach for Safe Human-Robot Interaction
The purpose of the test is to assess whether one can verify that interaction patterns are carried out with negligible human-to-robot collision probability and whether, in the presence of user tuning, strategies which determine offending behaviors can be effectively repaired.
Is verification a requisite for safe adaptive robots?
This paper argues in favour of using formal methods to ensure safety of deployed stochastic policies learned by robots in unstructured environments by modelling safety using probabilistic computational tree logic and ensuring such safety via automated repair.
Evaluating probabilistic model checking tools for verification of robot control policies
This paper evaluates PMC tools – namely COMICS, MRMC and PRISM – to investigate safe reinforcement learning in robots, i.e., to establish safety of policies learned considering feedback signals received upon acting in partially unknown environments.
Probabilistic Model Checking of Robots Deployed in Extreme Environments
A framework for probabilistic model checking on a layered Markov model to verify the safety and reliability requirements of such robots, both at pre-mission stage and during runtime.
Safety-critical advanced robots: A survey
Reactive reaching and grasping on a humanoid: Towards closing the action-perception loop on the iCub
A system incorporating a tight integration between computer vision and robot control modules on a complex, high-DOF humanoid robot that can avoid obstacles and other objects detected in the visual stream while reaching for the intended target object.
Safety Monitoring for Autonomous Systems: Interactive Elicitation of Safety Rules. (Moniteurs de sécurité pour des systèmes autonomes: élicitation interactive des règles de sécurité)
This work focuses on solving cases where the synthesis fails to return a set of safe and permissive rules, and three new features are introduced and developed to assist the user in these cases.
Trusting robots : Contributions to dependable autonomous collaborative robotic systems. (Vers des robots collaboratifs autonomes sûrs de fonctionnement)
This manuscript of HDR presents research work of Jeremie Guiochet carried out at LAAS-CNRS in the Dependable computing and Fault Tolerance (TSF) team, mainly related to the dependability of collaborative autonomous robotic systems.
Periodic state-machine aware real-time analysis
A method to analyze the temporal behavior of a component-based architecture in which the components are described by state-machines is proposed and computes an accurate worst-case response time by taking into account the state-Machines of the components.


Guaranteed Safe Online Learning via Reachability: tracking a ground target using a quadrotor
  • J. Gillula, C. Tomlin
  • Computer Science
    2012 IEEE International Conference on Robotics and Automation
  • 2012
The GSOLR framework can be applied to a target tracking problem, in which an observing quadrotor helicopter must keep a target ground vehicle with unknown (but bounded) dynamics inside its field of view at all times, while simultaneously attempting to build a motion model of the target.
Complexity analysis of real-time reinforcement learning applied to finding shortest paths in determi
This report analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous real-time versions of Q-learning and value-iteration, applied to the problems of reaching any goal state from the given start state and finding shortest paths from all states to a goal state, and proves that they are tractable with only a simple change in the task representation.
Risk-Sensitive Reinforcement Learning Applied to Control under Constraints
A model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies based on weighting the original value function and the risk, which was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column.
The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms
The complexity of on-line reinforcement-learning algorithms applied to goal-directed exploration tasks is analyzed and it is proved that the algorithms are tractable with only a simple change in the reward structure ("penalizing the agent for action executions") or in the initialization of the values that they maintain.
Lyapunov Design for Safe Reinforcement Learning
This work proposes a method for constructing safe, reliable reinforcement learning agents based on Lyapunov design principles that ensures qualitatively satisfactory agent behavior for virtually any reinforcement learning algorithm and at all times, including while the agent is learning and taking exploratory actions.
Risk-Sensitive Reinforcement Learning
This risk-sensitive reinforcement learning algorithm is based on a very different philosophy and reflects important properties of the classical exponential utility framework, but avoids its serious drawbacks for learning.
The Modular Behavioral Environment for Humanoids and other Robots (MoBeE)
MoBeE is developed, a novel behavioral framework for humanoids and other complex robots, which integrates elements from vision, planning, and control, facilitating the synthesis of autonomous, adaptive behaviors.
Reinforcement Learning: An Introduction
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
An experimental evaluation of a novel minimum-jerk cartesian controller for humanoid robots
The design of a Cartesian Controller for a generic robot manipulator that deals with a large number of degrees of freedom, produce smooth, human-like motion and is able to compute the trajectory on-line is described.
DTMC Model Checking by SCC Reduction
A model checking algorithm for DTMCs that also supports the generation of counterexamples is introduced, based on the detection and abstraction of strongly connected components, which offers abstract countereXamples, which can be interactively refined by the user.