# Natural Gradient Deep Q-learning

@article{Knight2018NaturalGD, title={Natural Gradient Deep Q-learning}, author={Ethan Knight and Osher Lerner}, journal={ArXiv}, year={2018}, volume={abs/1803.07482} }

This paper presents findings for training a Q-learning reinforcement learning agent using natural gradient techniques. We compare the original deep Q-network (DQN) algorithm to its natural gradient counterpart (NGDQN), measuring NGDQN and DQN performance on classic controls environments without target networks. We find that NGDQN performs favorably relative to DQN, converging to significantly better policies faster and more frequently. These results indicate that natural gradient could be used…

## 8 Citations

### Beyond Target Networks: Improving Deep Q-learning with Functional Regularization

- Computer ScienceArXiv
- 2021

An alternative training method based on functional regularization which uses up-to-date parameters to estimate the target Q-values, thereby speeding up training while maintaining stability and showing empirical improvements in sample efficiency and performance across a range of Atari and simulated robotics environments.

### Towards Characterizing Divergence in Deep Q-Learning

- Computer ScienceArXiv
- 2019

An algorithm is developed which permits stable deep Q-learning for continuous control without any of the tricks conventionally used (such as target networks, adaptive gradient optimizers, or using multiple Q functions).

### Optimizing Q-Learning with K-FAC Algorithm

- Computer ScienceAIST
- 2019

Considering the latest results, it is shown that DDQN with K-FAC learns more quickly than with other optimizers and improves constantly in contradiction to similar with Adam or RMSProp.

### Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent

- Computer ScienceIJCAI
- 2020

The convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis), is characterized and the momentum restart scheme is proposed, resulting in the so-called Q-amSGradR algorithm, which outperforms the vanilla Q- learning with SGD updates.

### BRPO: Batch Residual Policy Optimization

- Computer ScienceIJCAI
- 2020

This work derives a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance, and shows that BRPO achieves the state-of-the-art performance in a number of tasks.

### Bridging the Gap Between Target Networks and Functional Regularization

- Computer ScienceArXiv
- 2022

It is demonstrated that replacing Target Networks with the more theoretically grounded Functional Regularization approach leads to better sample efﬁciency and performance improvements.

### Direction Concentration Learning: Enhancing Congruency in Machine Learning

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2021

The experimental results show that the proposed DCL method generalizes to state-of-the-art models and optimizers, as well as improves the performances of saliency prediction task, continual learning task, and classification task and helps mitigate the catastrophic forgetting problem in the continuallearning task.

### Toward Efficient Gradient-Based Value Estimation

- Computer Science
- 2023

To resolve the adverse effect of poor conditioning of MSBE on gradient based methods, a low complexity batch-free proximal method that approximately follows the Gauss-Newton direction and is asymptotically robust to parameterization is proposed.

## References

SHOWING 1-10 OF 36 REFERENCES

### Playing Atari with Deep Reinforcement Learning

- Computer ScienceArXiv
- 2013

This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

### Deep Q-learning From Demonstrations

- Computer ScienceAAAI
- 2018

This paper presents an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstrating data and is able to automatically assess the necessary ratio of demonstrationData while learning thanks to a prioritized replay mechanism.

### Natural Temporal Difference Learning

- Computer ScienceAAAI
- 2014

This paper presents and analyzes quadratic and linear time natural temporal difference learning algorithms, and proves that they are covariant, and suggests that the natural algorithms can match or outperform their non-natural counterparts using linear function approximation, and drastically improve upon them when using non-linear function approximation.

### On-line Q-learning using connectionist systems

- Computer Science
- 1994

Simulations show that on-line learning algorithms are less sensitive to the choice of training parameters than backward replay, and that the alternative update rules of MCQ-L and Q( ) are more robust than standard Q-learning updates.

### Self-improving reactive agents based on reinforcement learning, planning and teaching

- Computer ScienceMachine Learning
- 2004

This paper compares eight reinforcement learning frameworks: Adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning and two extensions are experience replay, learning action models for planning, and teaching.

### A Natural Policy Gradient

- Computer ScienceNIPS
- 2001

This work provides a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space and shows drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.

### Gradient temporal-difference learning algorithms

- Computer Science
- 2011

We present a new family of gradient temporal-difference (TD) learning methods with function approximation whose complexity, both in terms of memory and per-time-step computation, scales linearly with…

### Reinforcement learning for robots using neural networks

- Computer Science
- 1992

This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems.

### Prioritized Experience Replay

- Computer ScienceICLR
- 2016

A framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently, in Deep Q-Networks, a reinforcement learning algorithm that achieved human-level performance across many Atari games.

### Trust Region Policy Optimization

- Computer ScienceICML
- 2015

A method for optimizing control policies, with guaranteed monotonic improvement, by making several approximations to the theoretically-justified scheme, called Trust Region Policy Optimization (TRPO).