Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows

  title={Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows},
  author={Gene Ryan Yoo and Houman Owhadi},

Figures and Tables from this paper

Do ideas have shape? Plato's theory of forms as the continuous limit of artificial neural networks
It is shown that ResNets converge, in the infinite depth limit, to a generalization of image registration algorithms, and the registration regularization strategy provides a provably robust alternative to Dropout for ANNs.
Learning "best" kernels from data in Gaussian process regression. With application to aerodynamics
Algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques to solve the problem of approximating a regular target function given observations of it, i.e. supervised learning.
Computational Graph Completion
  • H. Owhadi
  • Computer Science
    Research in the Mathematical Sciences
  • 2022
The Computational Graph Completion (CGC) problem addressed by the proposed framework could therefore be interpreted as a generalization of that of solving linear systems of equations to that of approximating unknown variables and functions with noisy, incomplete, and nonlinear dependencies.
SpinalNet: Deep Neural Network with Gradual Input
The human somatosensory system is studied and the SpinalNet is proposed to achieve higher accuracy with less computational resources and the vanishing gradient problem does not exist.
Deep Learning with Kernel Flow Regularization for Time Series Forecasting
This paper introduces application of kernel flow methods for time series forecasting in general and describes a regularization method by applying kernel flow loss function on LSTM layers that achieves similar regularization effect to dropout.
Simple, low-cost and accurate data-driven geophysical forecasting with learned kernels
The proposed approach is general, and the results support the viability of kernel methods (with learned kernels) for interpretable and computationally efficient geophysical forecasting for a large diversity of processes.
Data-driven geophysical forecasting: Simple, low-cost, and accurate baselines with kernel methods
This work shows that when the kernel of these emulators is also learned from data (using kernel flows, a variant of cross-validation), then the resulting data-driven models are not only faster than equationbased models but are easier to train than neural networks such as the long short-term memory neural network.
Learning dynamical systems from data: a simple cross-validation perspective
Variants of cross-validation (Kernel Flows and its variants based on Maximum Mean Discrepancy and Lyapunov exponents) are presented as simple approaches for learning the kernel used in these emulators.
Consistency of Empirical Bayes And Kernel Flow For Hierarchical Parameter Estimation
The purpose of this paper is to compare the empirical Bayesian and approximation theoretic approaches to hierarchical learning, in terms of large data consistency, variance of estimators, robustness of the estimators to model misspecification, and computational cost.


Wide Residual Networks
This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.
Operator-Adapted Wavelets, Fast Solvers, and Numerical Homogenization
This introduction reviews, summarizes, and illustrates fundamental connections between Bayesian inference, numerical quadrature, Gausssian process regression, polyharmonic splines, information-based
Cold Case: The Lost MNIST Digits
A reconstruction that is accurate enough to serve as a replacement for the MNIST dataset, with insignificant changes in accuracy, is proposed, and the results unambiguously confirm the trends observed by Recht et al.
A Game Theoretic Approach to Numerical Approximation and Algorithm Design
In this talk, Professor Owhadi will examine the interplay between game theory, numerical approximation, and Gaussian process regression via multiscale analysis problems, fast solvers design, operator adapted wavelet identification, and computations with dense kernel matrices.
Wide neural networks of any depth evolve as linear models under gradient descent
This work shows that for wide NNs the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.
Do ImageNet Classifiers Generalize to ImageNet?
The results suggest that the accuracy drops are not caused by adaptivity, but by the models' inability to generalize to slightly "harder" images than those found in the original test sets.
Kernel Flows: from learning kernels from data into the abyss
Neural tangent kernel: convergence and generalization in neural networks (invited paper)
This talk will introduce this formalism and give a number of results on the Neural Tangent Kernel and explain how they give us insight into the dynamics of neural networks during training and into their generalization features.
Do CIFAR-10 Classifiers Generalize to CIFAR-10?
This work measures the accuracy of CIFAR-10 classifiers by creating a new test set of truly unseen images and finds a large drop in accuracy for a broad range of deep learning models.
Dropout: a simple way to prevent neural networks from overfitting
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.