# Gamma-Nets: Generalizing Value Estimation over Timescale

@inproceedings{Sherstan2020GammaNetsGV, title={Gamma-Nets: Generalizing Value Estimation over Timescale}, author={Craig Sherstan and Shibhansh Dohare and J. MacGlashan and J. G{\"u}nther and P. Pilarski}, booktitle={AAAI}, year={2020} }

We present $\Gamma$-nets, a method for generalizing value function estimation over timescale. By using the timescale as one of the estimator's inputs we can estimate value for arbitrary timescales. As a result, the prediction target for any timescale is available and we are free to train on multiple timescales at each timestep. Here we empirically evaluate $\Gamma$-nets in the policy evaluation setting. We first demonstrate the approach on a square wave and then on a robot arm using linear… Expand

#### Figures and Tables from this paper

#### 3 Citations

Discount Factor as a Regularizer in Reinforcement Learning

- Computer Science, Mathematics
- ICML
- 2020

For several Temporal-Difference (TD) learning methods, an explicit equivalence is shown between using a reduced discount factor and adding an explicit regularization term to the algorithm’s loss. Expand

Work in Progress: Temporally Extended Auxiliary Tasks

- Computer Science
- ArXiv
- 2020

The overall conclusions are that TD-AE increases the robustness of the A2C algorithm to the trajectory length and while promising, further study is required to fully understand the relationship between auxiliary task prediction timescale and the agent's performance. Expand

Heuristic-Guided Reinforcement Learning

- Computer Science
- ArXiv
- 2021

This work provides a framework for accelerating reinforcement learning (RL) algorithms by heuristics constructed from domain knowledge or offline data and introduces the novel concept of an improvable heuristic – a heuristic that allows an RL agent to extrapolate beyond its prior knowledge. Expand

#### References

SHOWING 1-10 OF 27 REFERENCES

Multi-timescale nexting in a reinforcement learning robot

- Computer Science
- Adapt. Behav.
- 2014

This paper presents results with a robot that learns to next in real time, making thousands of predictions about sensory input signals at timescales from 0.1 to 8 seconds, and extends nexting beyond simple timescale by letting the discount rate be a function of the state. Expand

Multi-timescale Nexting in a Reinforcement Learning Robot

- Computer Science
- SAB
- 2012

This paper presents results with a robot that learns to next in real time, predicting thousands of features of the world’s state, including all sensory inputs, at timescales from 0.1 to 8 seconds. Expand

Separating value functions across time-scales

- Computer Science, Mathematics
- ICML 2019
- 2019

This work presents an extension of temporal difference (TD) learning, which it calls TD($\Delta$), that breaks down a value function into a series of components based on the differences between value functions with smaller discount factors, which has useful properties in scalability and performance. Expand

Universal Value Function Approximators

- Mathematics, Computer Science
- ICML
- 2015

An efficient technique for supervised learning of universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g is developed and it is demonstrated that a UVFA can successfully generalise to previously unseen goals. Expand

Meta-Gradient Reinforcement Learning

- Computer Science, Mathematics
- NeurIPS
- 2018

A gradient-based meta-learning algorithm is discussed that is able to adapt the nature of the return, online, whilst interacting and learning from the environment and achieved a new state-of-the-art performance. Expand

Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning

- Computer Science
- Artif. Intell.
- 1999

It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning. Expand

A Distributional Perspective on Reinforcement Learning

- Computer Science, Mathematics
- ICML
- 2017

This paper argues for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent, and designs a new algorithm which applies Bellman's equation to the learning of approximate value distributions. Expand

Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction

- Computer Science
- AAMAS
- 2011

Results using Horde on a multi-sensored mobile robot to successfully learn goal-oriented behaviors and long-term predictions from off-policy experience are presented. Expand

Successor Features for Transfer in Reinforcement Learning

- Computer Science
- NIPS
- 2017

This work proposes a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same, and derives two theorems that set the approach in firm theoretical ground and presents experiments that show that it successfully promotes transfer in practice. Expand

Universal Successor Representations for Transfer Reinforcement Learning

- Computer Science, Mathematics
- ICLR
- 2018

This work proposes to use universal successor representations (USR) to represent the transferable knowledge and a USR approximator (USRA) that can be trained by interacting with the environment that can effectively applied to new tasks. Expand