• Corpus ID: 49428304

Many-Goals Reinforcement Learning

@article{Veeriah2018ManyGoalsRL,
  title={Many-Goals Reinforcement Learning},
  author={Vivek Veeriah and Junhyuk Oh and Satinder Singh},
  journal={ArXiv},
  year={2018},
  volume={abs/1806.09605}
}
All-goals updating exploits the off-policy nature of Q-learning to update all possible goals an agent could have from each transition in the world, and was introduced into Reinforcement Learning (RL) by Kaelbling (1993). In prior work this was mostly explored in small-state RL problems that allowed tabular representations and where all possible goals could be explicitly enumerated and learned separately. In this paper we empirically explore 3 different extensions of the idea of updating many… 

Figures and Tables from this paper

Effective Deep Reinforcement Learning Setups for Multiple Goals on Visual Navigation
TLDR
For visual topologic navigation, combining visual information of the current and goal states through Hadamard product or Gated-Attention module allows the network learning near-optimal navigation policies, and empirically shows that the ϵ-categorical policy helps to avoid local minimums during the training, which facilitates the convergence to better results.
Learning and Exploiting Multiple Subgoals for Fast Exploration in Hierarchical Reinforcement Learning
TLDR
A multi-goal HRL algorithm, consisting of a high-level policy Manager and a low- level policy Worker, that achieves the same performance as state-of-the-art HRL methods with substantially reduced training time cost is devised.
Learning user-defined sub-goals using memory editing in reinforcement learning
  • G. Lee
  • Computer Science
    ArXiv
  • 2022
TLDR
A methodology to achieve the user-defined sub-goals as well as the final goal under control using memory editing is proposed and the agent was able to be induced to visit the novel state indirectly in the environments.
Goal-Conditioned Reinforcement Learning with Imagined Subgoals
TLDR
This work proposes to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks and evaluates its approach on complex robotic navigation and manipulation tasks and shows that it outperforms existing methods by a large margin.
Exploration via Hindsight Goal Generation
TLDR
HGG is introduced, a novel algorithmic framework that generates valuable hindsight goals which are easy for an agent to achieve in the short term and are also potential for guiding the agent to reach the actual goal in the long term.
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning
TLDR
This work shows that an uncertainty aware classifier can solve challenging reinforcement learning problems by both encouraging exploration and provided directed guidance towards positive outcomes, and proposes a novel mechanism for obtaining these calibrated, uncertainty-aware classifiers based on an amortized technique for computing the normalized maximum likelihood (NML) distribution.
Planning with Goal-Conditioned Policies
TLDR
This work shows that goal-conditioned policies learned with RL can be incorporated into planning, such that a planner can focus on which states to reach, rather than how those states are reached, and proposes using a latent variable model to compactly represent the set of valid states.
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
TLDR
CURIOUS is proposed, an algorithm that leverages a modular Universal Value Function Approximator with hindsight learning to achieve a diversity of goals of different kinds within a unique policy and an automated curriculum learning mechanism that biases the attention of the agent towards goals maximizing the absolute learning progress.
NON-PARAMETRIC DISCRIMINATIVE REWARDS
TLDR
An unsupervised learning algorithm is presented to train agents to achieve perceptuallyspecified goals using only a stream of observations and actions and simultaneously learns a goal-conditioned policy and a goal achievement reward function that measures how similar a state is to the goal state.
Open-Ended Reinforcement Learning with Neural Reward Functions
TLDR
This work proposes a different approach that uses reward functions encoded by neural networks to reward more complex behavior in high-dimensional robotic environments and in the pixel-based Montezuma’s Revenge environment.
...
1
2
3
4
...

References

SHOWING 1-10 OF 24 REFERENCES
Reinforcement Learning with Unsupervised Auxiliary Tasks
TLDR
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
Hybrid Reward Architecture for Reinforcement Learning
TLDR
A new method is proposed, called Hybrid Reward Architecture (HRA), which takes as input a decomposed reward function and learns a separate value function for each component reward function, enabling more effective learning.
End-to-End Training of Deep Visuomotor Policies
TLDR
This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.
Learning by Playing - Solving Sparse Reward Tasks from Scratch
TLDR
The key idea behind the method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment - enabling it to excel at sparse reward RL.
Learning state representation for deep actor-critic control
TLDR
A new algorithm, Model Learning Deep Deterministic Policy Gradient (ML-DDPG), is proposed that combines RL with state representation learning, i.e., learning a mapping from an input vector to a state before solving the RL task.
Hindsight Experience Replay
TLDR
A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum.
Human-level control through deep reinforcement learning
TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Universal Value Function Approximators
TLDR
An efficient technique for supervised learning of universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g is developed and it is demonstrated that a UVFA can successfully generalise to previously unseen goals.
Continual learning in reinforcement environments
TLDR
CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development is proposed, described, tested, and evaluated in this dissertation and generates a hierarchical, higher-order neural network that can be used for predicting context-dependent temporal sequences and can learn sequential-task benchmarks more than two orders of magnitude faster than competing neural-network systems.
Asynchronous Methods for Deep Reinforcement Learning
TLDR
A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.
...
1
2
3
...