• Corpus ID: 239049537

Enhanced Recurrent Neural Tangent Kernels for Non-Time-Series Data

  title={Enhanced Recurrent Neural Tangent Kernels for Non-Time-Series Data},
  author={Sina Alemohammad and Randall Balestriero and Zichao Wang and Richard Baraniuk},
Kernels derived from deep neural networks (DNNs) in the infinite-width regime provide not only high performance in a range of machine learning tasks but also new theoretical insights into DNN training dynamics and generalization. In this paper, we extend the family of kernels associated with recurrent neural networks (RNNs), which were previously derived only for simple RNNs, to more complex architectures including bidirectional RNNs and RNNs with average pooling. We also develop a fast GPU… 

Figures and Tables from this paper


Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
Results suggesting neural tangent kernels perform strongly on low-data tasks are reported, with comparing the performance of NTK with the finite-width net it was derived from, NTK behavior starts at lower net widths than suggested by theoretical analysis.
Learning and Generalization in RNNs
This paper proves that RNNs can learn functions of sequences, and introduces new ideas which enable us to extract information from the hidden state of the RNN in the authors' proofs—addressing a crucial weakness in previous work.
The Recurrent Neural Tangent Kernel
This paper introduces and study the Recurrent Neural Tangent Kernel (RNTK), which sheds new insights into the behavior of overparametrized RNNs, including how different time steps are weighted by the RNTK to form the output under different initialization parameters and nonlinearity choices, and how inputs of different lengths are treated.
On the Similarity between the Laplace and Neural Tangent Kernels
It is shown that NTK for fully connected networks is closely related to the standard Laplace kernel, and theoretically that for normalized data on the hypersphere both kernels have the same eigenfunctions and their eigenvalues decay polynomially at the same rate, implying that their Reproducing Kernel Hilbert Spaces (RKHS) include the same sets of functions.
SymJAX: symbolic CPU/GPU/TPU programming
SymJAX is a symbolic programming version of JAX simplifying graph input/output/updates and providing additional functionalities for general machine learning and deep learning applications, including Lasagne-like deep learning functionalities.
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes
This work derives an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and introduces a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible.
Chickenpox Cases in Hungary: a Benchmark Dataset for Spatiotemporal Signal Processing with Graph Neural Networks
Time series analysis and forecasting experiments demonstrate that the Chickenpox Cases in Hungary dataset is adequate for comparing the predictive performance and forecasting capabilities of novel recurrent graph neural network architectures.
Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics
This work shows the same neural networks in the so-called NTK parametrization during training follow a kernel gradient descent dynamics in function space, where the kernel is the infinite-width NTK.
Wearing A Mask: Compressed Representations of Variable-Length Sequences Using Recurrent Neural Tangent Kernels
This work extends existing methods that rely on the use of kernels to variable-length sequences via use of the Recurrent Neural Tangent Kernel (RNTK) and demonstrates how MASK can be used to extend principal components analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE).