# Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows

@article{Yoo2020DeepRA, title={Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows}, author={Gene Ryan Yoo and Houman Owhadi}, journal={ArXiv}, year={2020}, volume={abs/2002.08335} }

## 11 Citations

Do ideas have shape? Plato's theory of forms as the continuous limit of artificial neural networks

- Computer ScienceArXiv
- 2020

It is shown that ResNets converge, in the infinite depth limit, to a generalization of image registration algorithms, and the registration regularization strategy provides a provably robust alternative to Dropout for ANNs.

Learning "best" kernels from data in Gaussian process regression. With application to aerodynamics

- Computer ScienceSSRN Electronic Journal
- 2022

Algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques to solve the problem of approximating a regular target function given observations of it, i.e. supervised learning.

DeepParticle: learning invariant measure by a deep neural network minimizing Wasserstein distance on data generated from an interacting particle method

- Computer ScienceJ. Comput. Phys.
- 2022

Computational Graph Completion

- Computer ScienceResearch in the Mathematical Sciences
- 2022

The Computational Graph Completion (CGC) problem addressed by the proposed framework could therefore be interpreted as a generalization of that of solving linear systems of equations to that of approximating unknown variables and functions with noisy, incomplete, and nonlinear dependencies.

SpinalNet: Deep Neural Network with Gradual Input

- Computer ScienceIEEE Transactions on Artificial Intelligence
- 2022

The human somatosensory system is studied and the SpinalNet is proposed to achieve higher accuracy with less computational resources and the vanishing gradient problem does not exist.

Deep Learning with Kernel Flow Regularization for Time Series Forecasting

- Computer ScienceArXiv
- 2021

This paper introduces application of kernel flow methods for time series forecasting in general and describes a regularization method by applying kernel flow loss function on LSTM layers that achieves similar regularization effect to dropout.

Simple, low-cost and accurate data-driven geophysical forecasting with learned kernels

- Computer ScienceProceedings of the Royal Society A
- 2021

The proposed approach is general, and the results support the viability of kernel methods (with learned kernels) for interpretable and computationally efficient geophysical forecasting for a large diversity of processes.

Data-driven geophysical forecasting: Simple, low-cost, and accurate baselines with kernel methods

- Computer Science
- 2021

This work shows that when the kernel of these emulators is also learned from data (using kernel flows, a variant of cross-validation), then the resulting data-driven models are not only faster than equationbased models but are easier to train than neural networks such as the long short-term memory neural network.

Learning dynamical systems from data: a simple cross-validation perspective

- Computer SciencePhysica D: Nonlinear Phenomena
- 2021

Variants of cross-validation (Kernel Flows and its variants based on Maximum Mean Discrepancy and Lyapunov exponents) are presented as simple approaches for learning the kernel used in these emulators.

Consistency of Empirical Bayes And Kernel Flow For Hierarchical Parameter Estimation

- Computer ScienceMath. Comput.
- 2021

The purpose of this paper is to compare the empirical Bayesian and approximation theoretic approaches to hierarchical learning, in terms of large data consistency, variance of estimators, robustness of the estimators to model misspecification, and computational cost.

## References

SHOWING 1-10 OF 17 REFERENCES

Wide Residual Networks

- Computer ScienceBMVC
- 2016

This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.

Operator-Adapted Wavelets, Fast Solvers, and Numerical Homogenization

- Computer Science
- 2019

This introduction reviews, summarizes, and illustrates fundamental connections between Bayesian inference, numerical quadrature, Gausssian process regression, polyharmonic splines, information-based…

Cold Case: The Lost MNIST Digits

- Computer ScienceNeurIPS
- 2019

A reconstruction that is accurate enough to serve as a replacement for the MNIST dataset, with insignificant changes in accuracy, is proposed, and the results unambiguously confirm the trends observed by Recht et al.

A Game Theoretic Approach to Numerical Approximation and Algorithm Design

- Computer Science
- 2018

In this talk, Professor Owhadi will examine the interplay between game theory, numerical approximation, and Gaussian process regression via multiscale analysis problems, fast solvers design, operator adapted wavelet identification, and computations with dense kernel matrices.

Wide neural networks of any depth evolve as linear models under gradient descent

- Computer ScienceNeurIPS
- 2019

This work shows that for wide NNs the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.

Do ImageNet Classifiers Generalize to ImageNet?

- Computer ScienceICML
- 2019

The results suggest that the accuracy drops are not caused by adaptivity, but by the models' inability to generalize to slightly "harder" images than those found in the original test sets.

Kernel Flows: from learning kernels from data into the abyss

- Computer ScienceJ. Comput. Phys.
- 2019

Neural tangent kernel: convergence and generalization in neural networks (invited paper)

- Computer ScienceNeurIPS
- 2018

This talk will introduce this formalism and give a number of results on the Neural Tangent Kernel and explain how they give us insight into the dynamics of neural networks during training and into their generalization features.

Do CIFAR-10 Classifiers Generalize to CIFAR-10?

- Computer ScienceArXiv
- 2018

This work measures the accuracy of CIFAR-10 classifiers by creating a new test set of truly unseen images and finds a large drop in accuracy for a broad range of deep learning models.

Dropout: a simple way to prevent neural networks from overfitting

- Computer ScienceJ. Mach. Learn. Res.
- 2014

It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.