Overcoming the Spectral Bias of Neural Value Approximation

@article{Yang2022OvercomingTS,
  title={Overcoming the Spectral Bias of Neural Value Approximation},
  author={Ge Yang and Anurag Ajay and Pulkit Agrawal},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.04672}
}
Value approximation using deep neural networks is at the heart of off-policy deep reinforcement learning, and is often the primary module that provides learning signals to the rest of the algorithm. While multi-layer perceptron networks are universal function approximators, recent works in neural kernel regression suggest the presence of a spectral bias, where fitting high-frequency components of the value function requires exponentially more gradient update steps than the low-frequency ones… 

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

The proof exploits the low-effective-rank property of the Fisher Information Matrix at initialization, which implies a low effective dimension of the model (far smaller than the number of parameters) and concludes that local capacity control from the low effective rank of the Fischer Information Matrix is still underexplored theoretically.

Learning Dynamics and Generalization in Reinforcement Learning

The learning dynamics of temporal difference algorithms are analyzed to gain novel insight into the tension between these two objectives and it is shown theoretically that temporal difference learning encourages agents to non-smooth components of the value function early in training, and at the same time induces the second-order effect of discouraging generalization.

Neural networks trained with SGD learn distributions of increasing complexity

It is shown that neural networks trained using stochastic gradient descent initially classify their inputs using lower-order input statistics, like mean and covariance, and exploit higher-order statistics only later during training.

The Ladder in Chaos: A Simple and Effective Improvement to General DRL Algorithms by Policy Path Trimming and Boosting

This paper studies how the policy networks of typical DRL agents evolve during the learning process by empirically investigating several kinds of temporal change for each policy parameter, and proposes a simple and effective method, called Policy Path Trimming and Boosting (PPTB), as a general plug-in improvement to DRL algorithms.

Contrastive Learning as Goal-Conditioned Reinforcement Learning

This paper builds upon prior work and applies contrastive representation learning to action-labeled trajectories, in such a way that the (inner product of) learned representations exactly corresponds to a goal-conditioned value function.

Strong Lensing Source Reconstruction Using Continuous Neural Fields

From the nature of dark matter to the rate of expansion of our Universe, observations of distant galaxies distorted through strong gravitational lensing have the potential to answer some of the major

Learning Dynamics and Generalization in Deep Reinforcement Learning

The learning dynamics of temporal difference algorithms are analyzed to gain novel insight into the tension between these two objectives, and it is shown theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training, and at the same time induces the second-order effect of discouraging generalization.

References

SHOWING 1-10 OF 65 REFERENCES

Towards Understanding the Spectral Bias of Deep Learning

It is proved that the training process of neural networks can be decomposed along different directions defined by the eigenfunctions of the neural tangent kernel, where each direction has its own convergence rate and the rate is determined by the corresponding eigenvalue.

On Lazy Training in Differentiable Programming

This work shows that this "lazy training" phenomenon is not specific to over-parameterized neural networks, and is due to a choice of scaling that makes the model behave as its linearization around the initialization, thus yielding a model equivalent to learning with positive-definite kernels.

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning

An implicit under-parameterization phenomenon in value-based deep RL methods that use bootstrapping is identified: when value functions are trained with gradient descent using iterated regression onto target values generated by previous instances of the value network, more gradient updates decrease the expressivity of the current value network.

Towards Characterizing Divergence in Deep Q-Learning

An algorithm is developed which permits stable deep Q-learning for continuous control without any of the tricks conventionally used (such as target networks, adaptive gradient optimizers, or using multiple Q functions).

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

SUNRISE is a simple unified ensemble method, which is compatible with various off-policy RL algorithms and significantly improves the performance of existing off-Policy RL algorithms, such as Soft Actor-Critic and Rainbow DQN, for both continuous and discrete control tasks on both low-dimensional and high-dimensional environments.

On the Inductive Bias of Neural Tangent Kernels

This work studies smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks, and compares to other known kernels for similar architectures.

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.

Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

This work investigates generalization error for kernel regression, and proposes a predictive theory of generalization in kernel regression applicable to real data, which explains various generalization phenomena observed in wide neural networks, which admit a kernel limit and generalize well despite being overparameterized.

On the Expressivity of Neural Networks for Deep Reinforcement Learning

It is shown, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal $Q$-functions and policies are much more complex than the dynamics.

Beyond Target Networks: Improving Deep Q-learning with Functional Regularization

An alternative training method based on functional regularization which uses up-to-date parameters to estimate the target Q-values, thereby speeding up training while maintaining stability and showing empirical improvements in sample efficiency and performance across a range of Atari and simulated robotics environments.
...