Corpus ID: 6820915

Reinforcement Learning for Humanoid Robotics

@inproceedings{Peters2003ReinforcementLF,
  title={Reinforcement Learning for Humanoid Robotics},
  author={Jan Peters and Sethu Vijayakumar and Stefan Schaal},
  year={2003}
}
Reinforcement learning offers one of the most general framework to take traditional robotics towards true autonomy and versatility. [...] Key Method Methods can be coarsely classified into three different categories, i.e., greedy methods, ‘vanilla’ policy gradient methods, and natural gradient methods. We discuss that greedy methods are not likely to scale into the domain humanoid robotics as they are problematic when used with function approximation. ‘Vanilla’ policy gradient methods on the other hand have…Expand
Policy search for motor primitives in robotics
TLDR
A novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives is introduced and applied in the context of motor learning and can learn a complex Ball-in-a-Cup task on a real Barrett WAM™ robot arm.
Policy gradient learning for a humanoid soccer robot
TLDR
It is demonstrated that an extension of the classic Policy Gradient algorithm that takes into account parameter relevance allows for better solutions when only a few experiments are available, as well as its higher convergence rate when the relevance of parameters is taken into account during learning.
Reinforcement learning for balancer embedded humanoid locomotion
TLDR
A new learning-walking scheme where a humanoid robot is embedded with a primitive balancing controller for safety and the results demonstrate that non-hierarchical RL algorithms with the structured FA is much faster than the hierarchical RL algorithm.
Policy Search for Motor Primitives in Robotics
TLDR
This paper extends previous work on policy learning from the immediate reward case to episodic reinforcement learning, resulting in a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning that is particularly well-suited for dynamic motor primitives.
Policy Gradient Methods for Robotics
  • Jan Peters, S. Schaal
  • Engineering, Computer Science
    2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
  • 2006
TLDR
An overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field is given and how the most recently developed methods can significantly improve learning performance is shown.
Survey of Model-Based Reinforcement Learning: Applications on Robotics
TLDR
It is argued that, by employing model-based reinforcement learning, the—now limited—adaptability characteristics of robotic systems can be expanded, and model- based reinforcement learning exhibits advantages that makes it more applicable to real life use-cases compared to model-free methods.
Scaling Reinforcement Learning Paradigms for Motor Control
TLDR
This poster looks at promising approaches that can potentially scale and suggest a novel formulation of the actor-critic algorithm which takes steps towards alleviating the current shortcomings, and proves that Kakade’s ‘average natural policy gradient’ is indeed the true natural gradient.
Convergence Analysis of Reinforcement Learning Approaches to Humanoid Locomotion
TLDR
Preliminary work on evaluating the convergence of a variety of temporal difference learning algorithms and comparing the results of each learning algorithm based on a simulation of a simple inverted pendulum in order to visualize the value and control action functions shows that the learning performance of TD(λ) is significantly better than the TD(0) and stochastic gradient algorithm (SGA) based learning.
Reinforcement learning of motor skills with policy gradients
TLDR
This paper examines learning of complex motor skills with human-like limbs, and combines the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning with the theory of stochastic policy gradient learning.
Towards a common implementation of reinforcement learning for multiple robotic tasks
TLDR
A practical core implementation of RL is presented, which enables the learning process for multiple robotic tasks with minimal per-task tuning or none, and a novel approach for action selection, called Q-biased softmax regression (QBIASSR), that takes advantage of the structure of the state space by attending the physical variables involved.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
Learning Attractor Landscapes for Learning Motor Primitives
TLDR
By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system.
Gradient Descent for General Reinforcement Learning
TLDR
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcement-learning algorithms, and allows policy-search and value-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search algorithm.
Reinforcement Learning
  • R. Sutton
  • Computer Science
    Handbook of Machine Learning
  • 2018
Reinforcement learning is an approach to artificial intelligence that emphasizes learning by the individual from its interaction with its environment. This contrasts with classical approaches to
Policy Gradient Methods for Reinforcement Learning with Function Approximation
TLDR
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.
Reinforcement learning for continuous action using stochastic gradient ascent
TLDR
The proposed method is based on a stochastic gradient ascent with respect to the policy parameter space and does not require a model of the environment to be given or learned, it does not need to approximate the value function explicitly, and it is incremental, requiring only a constant amount of computation per step.
Biped dynamic walking using reinforcement learning
This paper presents some results from a study of biped dynamic walking using reinforcement learning. During this study a hardware biped robot was built, a new reinforcement learning algorithm as well
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown
Model-Free Least-Squares Policy Iteration
TLDR
A new approach to reinforcement learning which combines least squares function approximation with policy iteration, which is model-free and completely off policy and an off-policy method which can use (or reuse) data collected from any source.
Experiments with Infinite-Horizon, Policy-Gradient Estimation
TLDR
Algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP) based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, 2001), which computes biased estimates of the performance gradient in POMDPs.
A Natural Policy Gradient
TLDR
This work provides a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space and shows drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.
...
1
2
3
4
...