Corpus ID: 159042253

Evolving Rewards to Automate Reinforcement Learning

  title={Evolving Rewards to Automate Reinforcement Learning},
  author={Aleksandra Faust and Anthony G. Francis and Dar Mehta},
Many continuous control tasks have easily formulated objectives, yet using them directly as a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many classical control tasks guide RL training using complex rewards, which require tedious hand-tuning. We automate the reward search with AutoRL, an evolutionary layer over standard RL that treats reward tuning as hyperparameter optimization and trains a population of RL agents to find a reward that maximizes the task… Expand
Learning Robotic Manipulation Skills Using an Adaptive Force-Impedance Action Space
This work proposes to factor the learning problem in a hierarchical learning and adaption architecture to get the best of both worlds in real-world robotics, and combines these components through a bio-inspired action space that is called AFORCE. Expand
RL-DARTS: Differentiable Architecture Search for Reinforcement Learning
Throughout this training process, it is shown that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies. Expand
Learning to Win, Lose and Cooperate through Reward Signal Evolution
A general framework for optimizing N goals given n reward signals is introduced and it is demonstrated that such an approach allows agents to learn high-level goals such as winning, losing and cooperating from scratch without prespecified reward signals in the game of Pong. Expand
Neural Architecture Evolution in Deep Reinforcement Learning for Continuous Control
Experiments show that the proposed Actor-Critic Neuroevolution algorithm often outperforms the strong Actor- Critic baseline and is capable of automatically finding topologies in a sample-efficient manner which would otherwise have to be found by expensive architecture search. Expand
LIEF: Learning to Influence through Evaluative Feedback
We present a multi-agent reinforcement learning framework where rewards are not only generated by the environment but also by other peers in it through inter-agent evaluative feedback. We show thatExpand
Meta-learning curiosity algorithms
This work proposes a strategy for encoding curiosity algorithms as programs in a domain-specific language and searching, during a meta-learning phase, for algorithms that enable RL agents to perform well in new domains. Expand
AutoRL-TSP: Sistema de Aprendizado por Reforço Automatizado para o Problema do Caixeiro Viajante
The AutoML (Automated Machine Learning) aims at developing techniques to automate the entire machine learning process to obtain a system that fits the problem conditions. In this sense, one of theExpand
Learning to Seek: Autonomous Source Seeking on a Nano Drone Microcontroller with Deep Reinforcement Learning
Nano drones are uniquely equipped for fully autonomous applications due to their agility, low cost, and small size. However, their constrained form factor limits flight time, sensor payload, andExpand
Learning to Seek: Deep Reinforcement Learning for Phototaxis of a Nano Drone in an Obstacle Field
This work deploys a deep reinforcement learning model, capable of direct paths even with noisy sensor readings, and demonstrates efficient light seeking by reaching the goal in simulation in 65% fewer steps and with 60% shorter paths, compared to a baseline random walker algorithm. Expand
Effective, interpretable algorithms for curiosity automatically discovered by evolutionary search
We take the hypothesis that curiosity is a mechanism found by evolution that encourages meaningful exploration early in an agent’s life in order to expose it to experiences that enable it to obtainExpand


Continuous control with deep reinforcement learning
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs. Expand
BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning
The Backward Reachability Curriculum (BaRC) begins policy training from states that require a small number of actions to accomplish the task, and expands the initial state distribution backwards in a dynamically-consistent manner once the policy optimization algorithm demonstrates sufficient performance. Expand
Evolution-Guided Policy Gradient in Reinforcement Learning
Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into theEA population periodically to inject gradient information into the EA. Expand
Hindsight Experience Replay
A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum. Expand
Residual Policy Learning
It is argued that RPL is a promising approach for combining the complementary strengths of deep reinforcement learning and robotic control, pushing the boundaries of what either can achieve independently. Expand
Reverse Curriculum Generation for Reinforcement Learning
This work proposes a method to learn goal-oriented tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved, and generates a curriculum of start states that adapts to the agent's performance, leading to efficient training on goal- oriented tasks. Expand
Soft Actor-Critic Algorithms and Applications
Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance. Expand
Learning to Walk via Deep Reinforcement Learning
A sample-efficient deep RL algorithm based on maximum entropy RL that requires minimal per-task tuning and only a modest number of trials to learn neural network policies is proposed and achieves state-of-the-art performance on simulated benchmarks with a single set of hyperparameters. Expand
Proximal Policy Optimization Algorithms
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objectiveExpand
Learning to Navigate the Web
DQN, deep reinforcement learning agent, with Q-value function approximated with a novel QWeb neural network architecture is trained with the ability of the agent to generalize to new instructions on World of Bits benchmark, on forms with up to 100 elements, supporting 14 million possible instructions. Expand