• Corpus ID: 23763602

Online adaptation to human engagement perturbations in simulated human-robot interaction using hybrid reinforcement learning

  title={Online adaptation to human engagement perturbations in simulated human-robot interaction using hybrid reinforcement learning},
  author={Theodore Tsitsimis and George Velentzas and Mehdi Khamassi and Costas S. Tzafestas},
—Dynamic uncontrolled human-robot interaction re- quires robots to be able to adapt to changes in the human’s behavior and intentions. Among relevant signals, non-verbal cues such as the human’s gaze can provide the robot with important information about the human’s current engagement in the task, and whether the robot should continue its current behavior or not. In a previous work [1] we proposed an active exploration algorithm for reinforcement learning where the reward function is the… 

Figures and Tables from this paper



Active Exploration and Parameterized Reinforcement Learning Applied to a Simulated Human-Robot Interaction Task

This work proposes an active exploration algorithm for RL in structured (parameterized) continuous action space and shows that it outperforms continuous parameterized RL both without active exploration and with active exploration based on uncertainty variations measured by a Kalman-Q-learning algorithm.

Evaluating the Engagement with Social Robots

This paper introduces a set of metrics useful in direct, face to face scenarios, based on the behaviors analysis of the human partners, and shows how such metrics are useful to assess how the robot is perceived by humans and how this perception changes according to the behaviors shown by the social robot.

Policy search for motor primitives in robotics

A novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives is introduced and applied in the context of motor learning and can learn a complex Ball-in-a-Cup task on a real Barrett WAM™ robot arm.

Robot Skill Learning: From Reinforcement Learning to Evolution Strategies

It is striking that PI2 and (μW, λ)-ES share a common core, and that the simpler algorithm converges faster and leads to similar or lower final costs, which is due to a third trend in robot skill learning: the predominant use of dynamic movement primitives.

Reinforcement learning in robotics: A survey

This article attempts to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots by highlighting both key challenges in robot reinforcement learning as well as notable successes.

Deep Reinforcement Learning in Parameterized Action Space

This paper represents a successful extension of deep reinforcement learning to the class of parameterized action space MDPs within the domain of simulated RoboCup soccer, which features a small set of discrete action types each of which is parameterized with continuous variables.

Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning

A dual-system computational model is developed that can predict both performance and reaction times during learning of a stimulus–response association task and a model is proposed for QL and BWM coordination such that the expensive memory manipulation is under control of, among others, the level of convergence of the habitual learning.

Reinforcement Learning in Continuous Action Spaces

  • H. van HasseltM. Wiering
  • Computer Science
    2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
  • 2007
This work presents a new class of algorithms named continuous actor critic learning automaton (CACLA) that can handle continuous states and actions and shows that CACLA performs much better than the other algorithms, especially when it is combined with a Gaussian exploration method.

Meta-learning in Reinforcement Learning

Reinforcement Learning with Parameterized Actions

The Q-PAMDP algorithm for learning in Markov decision processes with parameterized actions with continuous parameters is introduced, shown that it converges to a local optimum, and compared to direct policy search in the goal-scoring and Platform domains.