Regularizing Action Policies for Smooth Control with Reinforcement Learning

@article{Mysore2021RegularizingAP,
  title={Regularizing Action Policies for Smooth Control with Reinforcement Learning},
  author={Siddharth Mysore and Bassel Mabsout and Renato Mancuso and Kate Saenko},
  journal={2021 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2021},
  pages={1810-1816}
}
A critical problem with the practical utility of controllers trained with deep Reinforcement Learning (RL) is the notable lack of smoothness in the actions learned by the RL policies. This trend often presents itself in the form of control signal oscillation and can result in poor control, high power consumption, and undue system wear. We introduce Conditioning for Action Policy Smoothness (CAPS), an effective yet intuitive regularization on action policies, which offers consistent improvement… 

Figures and Tables from this paper

Image-Based Conditioning for Action Policy Smoothness in Autonomous Miniature Car Racing with Reinforcement Learning
TLDR
This paper applies the Conditioning for Action Policy Smoothness with image-based input to smooth the control of an autonomous miniature car racing and applies CAPS and sim-to-real transfer methods to stabilize the control at a higher speed.
L2C2: Locally Lipschitz Continuous Constraint towards Stable and Smooth Reinforcement Learning
TLDR
By designing the spatio-temporal locally compact space for L2C2 from the state transition at each time step, the moderate smoothness can be achieved without loss of expressiveness.
Smooth Exploration for Robotic Reinforcement Learning
TLDR
G SDE is evaluated both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car, which allows training directly on the real robots without loss of performance.
Delayed Reinforcement Learning by Imitation
TLDR
A novel algorithm, Delayed Imitation with Dataset Aggregation (DIDA), which builds upon imitation learning methods to learn how to act in a delayed environment from undelayed demonstrations, and shows empirically that DIDA obtains high performances with a remarkable sample efficiency on a variety of tasks.
Deep Residual Reinforcement Learning based Autonomous Blimp Control
TLDR
This paper presents a learning-based framework based on deep residual reinforcement learning (DRRL), for the blimp control task, and shows that the agent, trained only in simulation, is sufficiently robust to control an actual blimp in windy conditions.
Autonomous Blimp Control using Deep Reinforcement Learning
TLDR
The initial results in simulation show a significant potential of DRL in solving the blimp control task and robustness against moderate wind and parameter uncertainty and a deep reinforcement learning approach to address these issues.
Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UAVs: Field Experiments
TLDR
It is shown that DRL can successfully learn to perform attitude control of a fixedwing UAV operating directly on the original nonlinear dynamics, requiring as little as three minutes of flight data.
Reactive Stepping for Humanoid Robots using Reinforcement Learning: Application to Standing Push Recovery on the Exoskeleton Atalante
TLDR
A reinforcement learning framework capable of learning robust standing push recovery for bipedal robots with a smooth out-of-the-box transfer to reality, requiring only instantaneous proprioceptive observations is presented.
Latent Imagination Facilitates Zero-Shot Transfer in Autonomous Racing
TLDR
This paper investigates how model-based agents capable of learning in imagination substantially outperform model-free agents with respect to performance, sample efficiency, successful task completion, and generalization in real-world autonomous vehicle control tasks, where advanced model- free deep RL algorithms fail.
Decentralized Global Connectivity Maintenance for Multi-Robot Navigation: A Reinforcement Learning Approach
TLDR
This work proposes a reinforcement learning approach to develop a decentralized policy, which is shared among multiple robots, and incorporates connectivity concerns into the RL framework as constraints and introduces behavior cloning to reduce the exploration complexity of policy optimization.
...
...

References

SHOWING 1-10 OF 37 REFERENCES
Continuous control with deep reinforcement learning
TLDR
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Benchmarking Deep Reinforcement Learning for Continuous Control
TLDR
This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure.
Regularization Matters in Policy Optimization
TLDR
It is found conventional regularization techniques on the policy networks can often bring large improvement on the task performance, and the improvement is typically more significant when the task is more difficult.
Deep Reinforcement Learning with Smooth Policy
TLDR
This work develops a new framework -- smoothness-inducing regularization -- that can improve the robustness of policy against measurement error in the state space, and can be naturally extended to distribubutionally robust setting.
Flight Controller Synthesis Via Deep Reinforcement Learning
TLDR
Work summarized in this thesis demonstrates that reinforcement learning is able to be leveraged for training neural network controllers capable, not only of maintaining stable flight, but also precision aerobatic maneuvers in real world settings.
Control of a Quadrotor With Reinforcement Learning
TLDR
A method to control a quadrotor with a neural network trained using reinforcement learning techniques and a new learning algorithm that differs from the existing ones in certain aspects is presented, found that it is more applicable to controlling a Quadrotor than existing algorithms.
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
TLDR
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.
Reinforcement Learning for UAV Attitude Control
TLDR
This work developed an open source high-fidelity simulation environment to train a flight controller attitude control of a quadrotor through RL, and used this environment to compare their performance to that of a PID controller to identify if using RL is appropriate in high-precision, time-critical flight control.
Sim-to-(Multi)-Real: Transfer of Low-Level Robust Control Policies to Multiple Quadrotors
TLDR
To the best of the knowledge, this is the first work that demonstrates that a simple neural network can learn a robust stabilizing low-level quadrotor controller (without the use of a stabilizing PD controller) that is shown to generalize to multiple quadrotors.
Human-level control through deep reinforcement learning
TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
...
...