Gamma-Nets: Generalizing Value Estimation over Timescale

  title={Gamma-Nets: Generalizing Value Estimation over Timescale},
  author={Craig Sherstan and Shibhansh Dohare and J. MacGlashan and J. G{\"u}nther and P. Pilarski},
We present $\Gamma$-nets, a method for generalizing value function estimation over timescale. By using the timescale as one of the estimator's inputs we can estimate value for arbitrary timescales. As a result, the prediction target for any timescale is available and we are free to train on multiple timescales at each timestep. Here we empirically evaluate $\Gamma$-nets in the policy evaluation setting. We first demonstrate the approach on a square wave and then on a robot arm using linear… Expand
Discount Factor as a Regularizer in Reinforcement Learning
For several Temporal-Difference (TD) learning methods, an explicit equivalence is shown between using a reduced discount factor and adding an explicit regularization term to the algorithm’s loss. Expand
Work in Progress: Temporally Extended Auxiliary Tasks
The overall conclusions are that TD-AE increases the robustness of the A2C algorithm to the trajectory length and while promising, further study is required to fully understand the relationship between auxiliary task prediction timescale and the agent's performance. Expand
Heuristic-Guided Reinforcement Learning
This work provides a framework for accelerating reinforcement learning (RL) algorithms by heuristics constructed from domain knowledge or offline data and introduces the novel concept of an improvable heuristic – a heuristic that allows an RL agent to extrapolate beyond its prior knowledge. Expand


Multi-timescale nexting in a reinforcement learning robot
This paper presents results with a robot that learns to next in real time, making thousands of predictions about sensory input signals at timescales from 0.1 to 8 seconds, and extends nexting beyond simple timescale by letting the discount rate be a function of the state. Expand
Multi-timescale Nexting in a Reinforcement Learning Robot
This paper presents results with a robot that learns to next in real time, predicting thousands of features of the world’s state, including all sensory inputs, at timescales from 0.1 to 8 seconds. Expand
Separating value functions across time-scales
This work presents an extension of temporal difference (TD) learning, which it calls TD($\Delta$), that breaks down a value function into a series of components based on the differences between value functions with smaller discount factors, which has useful properties in scalability and performance. Expand
Universal Value Function Approximators
An efficient technique for supervised learning of universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g is developed and it is demonstrated that a UVFA can successfully generalise to previously unseen goals. Expand
Meta-Gradient Reinforcement Learning
A gradient-based meta-learning algorithm is discussed that is able to adapt the nature of the return, online, whilst interacting and learning from the environment and achieved a new state-of-the-art performance. Expand
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning. Expand
A Distributional Perspective on Reinforcement Learning
This paper argues for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent, and designs a new algorithm which applies Bellman's equation to the learning of approximate value distributions. Expand
Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
Results using Horde on a multi-sensored mobile robot to successfully learn goal-oriented behaviors and long-term predictions from off-policy experience are presented. Expand
Successor Features for Transfer in Reinforcement Learning
This work proposes a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same, and derives two theorems that set the approach in firm theoretical ground and presents experiments that show that it successfully promotes transfer in practice. Expand
Universal Successor Representations for Transfer Reinforcement Learning
This work proposes to use universal successor representations (USR) to represent the transferable knowledge and a USR approximator (USRA) that can be trained by interacting with the environment that can effectively applied to new tasks. Expand