Learn More
Coordination of multiple behaviors independently obtained by a reinforcement learning method is one of the issues in order for the method to be scaled to larger and more complex robot learning tasks. Direct combination of all the state spaces for individual modules (subtasks) needs enormous learning time, and it causes hidden states. This paper presents a(More)
A method is proposed which accomplishes a whole task consisting of plural subtasks by coordinating multiple behaviors acquired by a vision-based reinforcement learning. First, individual behaviors which achieve the corresponding subtasks are independently acquired by Q-learning, a widely used reinforcement learning method. Each learned behavior can be(More)
The aim of the Cyber Rodent project is to understand the origins of our reward and affective systems by building artificial agents that share the same intrinsic constraints as natural agents: Self-preservation and self-reproduction. A Cyber Rodent is a robot that can search for and recharge from battery packs on the floor and copy its programs to a nearby(More)
Linearly solvable Markov Decision Process (LMDP) is a class of optimal control problem in which the Bellman's equation can be converted into a linear equation by an exponential transformation of the state value function (Todorov, 2009b). In an LMDP, the optimal value function and the corresponding control policy are obtained by solving an eigenvalue problem(More)
| Coevolution has been receiving increased attention as a method for simultaneously developing the control structures of multiple agents. Our ultimate goal is the mutual development of skills through coevolution. The coevolutionary process is, however, often prone to settle into suboptimal strategies. The key to successful coevolution has thus far been(More)
In this paper, we rst discuss the meaning of physical embodiment and the complexity of the environment in the context of multiagent learning. We then propose a vision-based reinforcement learning method that acquires cooperative behaviors in a dynamic environment. We use the robot soccer game initiated by RoboCup [12] to illustrate the e ectiveness of our(More)
The speed and performance of learning depend on the complexity of the learner. A simple learner with few parameters and no internal states can quickly obtain a reactive policy, but its performance is limited. A learner with many parameters and internal states may finally achieve high performance, but it may take enormous time for learning. Therefore, it is(More)
Embodied evolution is a methodology for evolutionary robotics that mimics the distributed, asynchronous and autonomous properties of biological evolution. The evaluation, selection and reproduction are carried out by and between the robots, without any need for human intervention. In this paper we propose a biologically inspired embodied evolution(More)
Hierarchical reinforcement learning (RL) algorithms can learn a policy faster than standard RL algorithms. However, the applicability of hierarchical RL algorithms is limited by the fact that the task decomposition has to be performed in advance by the human designer. We propose a Lamarckian evolutionary approach for automatic development of the learning(More)