• Corpus ID: 244908765

Functional Regularization for Reinforcement Learning via Learned Fourier Features

@inproceedings{Li2021FunctionalRF,
  title={Functional Regularization for Reinforcement Learning via Learned Fourier Features},
  author={Alexander Li and Deepak Pathak},
  booktitle={NeurIPS},
  year={2021}
}
We propose a simple architecture for deep reinforcement learning by embedding inputs into a learned Fourier basis and show that it improves the sample efficiency of both state-based and image-based RL. We perform infinite-width analysis of our architecture using the Neural Tangent Kernel and theoretically show that tuning the initial variance of the Fourier basis is equivalent to functional regularization of the learned deep network. That is, these learned Fourier features allow for adjusting… 
Interpreting Distributional Reinforcement Learning: Regularization and Optimization Perspectives
TLDR
R rigorous experiments reveal the different regularization effects as well as the mutual impact of vanilla entropy and risk-aware entropy regularization in distributional RL, focusing specifically on actor-critic algorithms.
O VERCOMING T HE S PECTRAL B IAS OF N EURAL V ALUE A PPROXIMATION
TLDR
This work re-examine off-policy reinforcement learning through the lens of kernel regression and proposes to overcome such bias via a composite neural tangent kernel via the Fourier feature networks.
L EARNING T RANSFERABLE P OLICIES B Y I NFERRING A GENT M ORPHOLOGY
TLDR
This work proposes the first reinforcement learning algorithm that can train a policy to generalize to new agent morphologies without requiring a description of the agent’s morphology in advance, and attains good performance without an explicit description of morphology.

References

SHOWING 1-10 OF 51 REFERENCES
Beyond Target Networks: Improving Deep Q-learning with Functional Regularization
TLDR
An alternative training method based on functional regularization which uses up-to-date parameters to estimate the target Q-values, thereby speeding up training while maintaining stability and showing empirical improvements in sample efficiency and performance across a range of Atari and simulated robotics environments.
Diagnosing Bottlenecks in Deep Q-learning Algorithms
TLDR
It is found that large neural network architectures have many benefits with regards to learning stability; offer several practical compensations for overfitting; and develop a novel sampling method based on explicitly compensating for function approximation error that yields fair improvement on high-dimensional continuous control domains.
Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
TLDR
An implicit under-parameterization phenomenon in value-based deep RL methods that use bootstrapping is identified: when value functions are trained with gradient descent using iterated regression onto target values generated by previous instances of the value network, more gradient updates decrease the expressivity of the current value network.
A Spectral Approach to Generalization and Optimization in Neural Networks
TLDR
This work proves that if the underlying distribution of data has nice spectral properties such as bandlimitedness, then the gradient descent method will converge to generalizable local minima and establishes a Fourier-based generalization bound for bandlimited spaces, which generalizes to other activation functions.
Spectral Normalisation for Deep Reinforcement Learning: an Optimisation Perspective
TLDR
It is shown that ablation studies are sufficient to modulate the parameter updates to recover most of the performance of spectral normalisation and hint towards the need to also focus on the neural component and its learning dynamics to tackle the peculiarities of Deep Reinforcement Learning.
On the Spectral Bias of Deep Neural Networks
TLDR
It is shown that deep networks with finite weights (or trained for finite number of steps) are inherently biased towards representing smooth functions over the input space, and all samples classified by a network to belong to a certain class are connected by a path such that the prediction of the network along that path does not change.
On Lazy Training in Differentiable Programming
TLDR
This work shows that this "lazy training" phenomenon is not specific to over-parameterized neural networks, and is due to a choice of scaling that makes the model behave as its linearization around the initialization, thus yielding a model equivalent to learning with positive-definite kernels.
Deep Reinforcement Learning with Double Q-Learning
TLDR
This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.
The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies
TLDR
It is shown theoretically and experimentally that a shallow neural network without bias cannot represent or learn simple, low frequency functions with odd frequencies and specific predictions of the time it will take a network to learn functions of varying frequency are led.
Towards Characterizing Divergence in Deep Q-Learning
TLDR
An algorithm is developed which permits stable deep Q-learning for continuous control without any of the tricks conventionally used (such as target networks, adaptive gradient optimizers, or using multiple Q functions).
...
1
2
3
4
5
...