Corpus ID: 28257125

Learning to Drive a Bicycle Using Reinforcement Learning and Shaping

@inproceedings{Randlv1998LearningTD,
  title={Learning to Drive a Bicycle Using Reinforcement Learning and Shaping},
  author={Jette Randl{\o}v and Preben Alstr{\o}m},
  booktitle={ICML},
  year={1998}
}
We present and solve a real-world problem of learning to drive a bicycle. [...] Key Method We solve the problem by online reinforcement learning using the Sarsa(A)-algorithm. Then we solve the composite problem of learning to balance a bicycle and then drive to 'It goal. In our approach the reinforcement function is independent of the task the agent tries to learn to solve.Expand
Learning a Self-driving Bicycle Using Deep Deterministic Policy Gradient
TLDR
This paper improves the method for learning a bicycle which can itself balance and go to any specified locations by proposing a procedure which allows the controller to be gradually learned until it can stably balance and lead the bicycle to anyspecified places. Expand
Controlling bicycle using deep deterministic policy gradient algorithm
  • Le Pham Tuyen, TaeChoong Chung
  • Engineering, Computer Science
  • 2017 14th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI)
  • 2017
TLDR
This study focuses on applying a state-of-the-art deep reinforcement learning algorithm called Deep Deterministic Policy Gradient to control the bicycle. Expand
Toward Self-Driving Bicycles Using State-of-the-Art Deep Reinforcement Learning Algorithms
TLDR
This paper uses a reward function and a deep neural network to build a controller for a bicycle using the DDPG (Deep Deterministic Policy Gradient) algorithm, which is a state-of-the-art deep reinforcement learning algorithm. Expand
Reinforcement learning for bicycle control
TLDR
To extend Randlov and Alstrom's work on shaping, this work implemented their original work using the PyBrain machine learning library and tried their own suite of complex reward functions that would work well for arbitrary goal destinations. Expand
Reinforcement Learning Model with a Reward Function Based on Human Driving Characteristics
  • Feng Pan, Hong Bao
  • Computer Science
  • 2019 15th International Conference on Computational Intelligence and Security (CIS)
  • 2019
TLDR
A comparison of the proposed RL model with human drivers shows that the trained agent can follow the preceding vehicle smoothly and safely. Expand
Learning bicycle stunts
TLDR
This work presents a general approach for simulating and controlling a human character that is riding a bicycle and uses Neuroevolution of Augmenting Topology (NEAT) to optimize both the parametrization and the parameters of the policies. Expand
Reinforcement-Driven Shaping of Sequence Learning in Neural Dynamics
TLDR
A recent framework for integrating reinforcement learning and dynamic neural fields is extended, by using the principle of shaping, in order to reduce the search space of the learning agent. Expand
Learning Macro-Actions in Reinforcement Learning
TLDR
A method for automatically constructing macro-actions from scratch from primitive actions during the reinforcement learning process to reinforce the tendency to perform action b after action a if such a pattern of actions has been rewarded. Expand
The Challenges of Reinforcement Learning in Robotics and Optimal Control
TLDR
This paper discusses a widely used RL algorithm called Q-learning, which can adapted to work in continuous states and action spaces, the methods for computing rewards which generates an adaptive optimal controller and accelerate learning process and finally the safe exploration approaches. Expand
A phased reinforcement learning algorithm for complex control problems
TLDR
The key element of the proposed algorithm is a shaping function defined on a novel position–direction space that is autonomously constructed once the goal is reached, and constrains the exploration area to optimize the policy. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 36 REFERENCES
Reward Functions for Accelerated Learning
TLDR
A methodology for designing reinforcement functions that take advantage of implicit domain knowledge in order to accelerate learning in situated domains characterized by multiple goals, noisy state, and inconsistent reinforcement is proposed. Expand
Training and Tracking in Robotics
TLDR
The learning system's ability to adapt to changes and to profit from a selected training sequence are explored, both of which are of obvious utility in practical robotics applications. Expand
Reinforcement learning and its application to control
TLDR
It is argued that for certain types of problems the latter approach, of which reinforcement learning is an example, can yield faster, more reliable learning, while the former approach is relatively inefficient. Expand
Robot shaping: The Hamster Experiment
In this paper we present an example of the application of a technique, which we call robot shaping, to designing and building learning autonomous robots. Our autonomous robot (called HAMSTER1) is aExpand
Robot Shaping: Developing Autonomous Agents Through Learning
TLDR
This paper connects both simulated and real robots to Alecsys, a parallel implementation of a learning classifier system with an extended genetic algorithm to demonstrate that classifier systems with genetic algorithms can be practically employed to develop autonomous agents. Expand
Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces
TLDR
This article proposes a simple and modular technique that can be used to implement function approximators with nonuniform degrees of resolution so that the value function can be represented with higher accuracy in important regions of the state and action spaces. Expand
Introduction to Reinforcement Learning
TLDR
In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Expand
Roles of Macro-Actions in Accelerating Reinforcement Learning
TLDR
Although eligibility traces increased the rate of convergence to the optimal value function compared to learning with macro-actions but without eligibility traces, eligibility traces did not permit the optimal policy to be learned as quickly as it was using macro- actions. Expand
Problem solving with reinforcement learning
This thesis is concerned with practical issues surrounding the application of reinforcement learning techniques to tasks that take place in high dimensional continuous state-space environments. InExpand
Temporal Difference Learning and TD-Gammon
  • G. Tesauro
  • Computer Science
  • J. Int. Comput. Games Assoc.
  • 1995
TLDR
TD-GAMMON is a neural network that trains itself to be an evaluation function for the game of backgammon by playing against itself and learning from the outcome. Expand
...
1
2
3
4
...