Trajectory Optimization for Unknown Constrained Systems using Reinforcement Learning

@article{Ota2019TrajectoryOF,
  title={Trajectory Optimization for Unknown Constrained Systems using Reinforcement Learning},
  author={Keita Ota and Devesh K. Jha and Tomoaki Oiki and Mamoru Miura and Takashi Nammoto and Daniel Nikovski and Toshisada Mariyama},
  journal={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2019},
  pages={3487-3494}
}
  • Keita Ota, Devesh K. Jha, T. Mariyama
  • Published 13 March 2019
  • Computer Science
  • 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
In this paper, we propose a reinforcement learning-based algorithm for trajectory optimization for constrained dynamical systems. This problem is motivated by the fact that for most robotic systems, the dynamics may not always be known. Generating smooth, dynamically feasible trajectories could be difficult for such systems. Using sampling-based algorithms for motion planning may result in trajectories that are prone to undesirable control jumps. However, they can usually provide a good… 

Figures and Tables from this paper

Deep Reactive Planning in Dynamic Environments
TLDR
Traditional kinematic planning, deep learning, and deep reinforcement learning are combined in a synergistic fashion to generalize to arbitrary environments and allow a robot to learn an end-to-end policy which can adapt to changes in the environment during execution.
Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning
TLDR
This work proposes a Reinforcement Learning-based approach, which learns target trajectory parameters for fully autonomous driving on highways, and demonstrates that this offline trained agent learns to drive smoothly, achieving velocities as close as possible to the desired velocity, while outperforming the other agents.
Efficient Exploration in Constrained Environments with Goal-Oriented Reference Path
TLDR
A deep convolutional network is trained that can predict collision-free paths based on a map of the environment– this is used by an reinforcement learning algorithm to learn to closely follow the path, which allows the trained agent to achieve good generalization while learning faster.
Dual-Arm Robot Trajectory Planning Based on Deep Reinforcement Learning under Complex Environment
TLDR
The trajectory planning of the two manipulators of the dual-arm robot is studied to approach the patient in a complex environment with deep reinforcement learning algorithms to help the robot obtain enough rewards to explore the environment.
Learning adaptive control in dynamic environments using reproducing kernel priors with bayesian policy gradients
TLDR
This paper proposes to use vector-valued kernel embedding (instead of parameter vectors) to represent policy distribution as features in non-decreasing Euclidean space, and develops policy search algorithm over Bayesian posterior estimation derived from inner-product of a priori Gaussian kernels.
Survivable Hyper-Redundant Robotic Arm with Bayesian Policy Morphing
TLDR
A Bayesian reinforcement learning framework that allows robotic manipulators to adaptively recover from random mechanical failures autonomously, hence being survivable, and it is shown that policy search, in the direction biased by prior experience, significantly improves learning efficiency in terms of sampling requirements.
Reinforcement learning for robot research: A comprehensive review and open issues
TLDR
This review article covers RL algorithms from theoretical background to advanced learning policies in different domains, which accelerate to solving practical problems in robotics.
Deep Reactive Planning in Dynamic Environments /Author=Ota, Kei; Jha, Devesh K.; Onishi, Tadashi; Kanezaki, Asako; Yoshiyasu, Yusuke; Mariyama, Toshisada; Nikovski, Daniel N. /CreationDate=November 12, 2020 /Subject=Robotics
TLDR
The main novelty of the proposed approach is that it allows a robot to learn an endtoend policy which can adapt to changes in the environment during execution by combining traditional kinematic planning, deep learning, and deep reinforcement learning in a synergistic fashion to generalize to arbitrary environments.
Developmentally Synthesizing Earthworm-Like Locomotion Gaits with Bayesian-Augmented Deep Deterministic Policy Gradients (DDPG)
TLDR
Bayesian actor-critic approach is extended by introducing augmented prior-based directed bias in policy search, aiding in faster parameter learning and reduced sampling requirements, and it also achieves faster kinematic indexes in various gaits.
Combining Programming by Demonstration with Path Optimization and Local Replanning to Facilitate the Execution of Assembly Tasks
TLDR
This work focuses on combining programming by demonstration with path optimization and local replanning methods to allow for fast and intuitive programming of assembly tasks that requires minimal user expertise.
...
1
2
...

References

SHOWING 1-10 OF 27 REFERENCES
Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics
TLDR
A policy search method that uses iteratively refitted local linear models to optimize trajectory distributions for large, continuous problems and can be used to learn complex neural network policies that successfully execute simulated robotic manipulation tasks in partially observed environments with numerous contact discontinuities and underactuation.
Reinforcement learning in robotics: A survey
TLDR
This article attempts to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots by highlighting both key challenges in robot reinforcement learning as well as notable successes.
Continuous control with deep reinforcement learning
TLDR
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Overcoming Exploration in Reinforcement Learning with Demonstrations
TLDR
This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.
Learning Robotic Assembly from CAD
TLDR
This work exploits the fact that in modern assembly domains, geometric information about the task is readily available via the CAD design files, and proposes a neural network architecture that can learn to track the motion plan, thereby generalizing the assembly controller to changes in the object positions.
Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
TLDR
A general and model-free approach for Reinforcement Learning on real robotics with sparse rewards built upon the Deep Deterministic Policy Gradient algorithm to use demonstrations that out-performs DDPG, and does not require engineered rewards.
End-to-End Training of Deep Visuomotor Policies
TLDR
This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.
Learning and generalization of motor skills by learning from demonstration
TLDR
A general approach for learning robotic motor skills from human demonstration is provided and how this framework extends to the control of gripper orientation and finger position and the feasibility of this approach is demonstrated.
Trust Region Policy Optimization
TLDR
A method for optimizing control policies, with guaranteed monotonic improvement, by making several approximations to the theoretically-justified scheme, called Trust Region Policy Optimization (TRPO).
Guided Policy Search
TLDR
This work presents a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima, and shows how differential dynamic programming can be used to generate suitable guiding samples, and describes a regularized importance sampled policy optimization that incorporates these samples into the policy search.
...
1
2
3
...