• Corpus ID: 21596346

Sobolev Training for Neural Networks

@inproceedings{Czarnecki2017SobolevTF,
  title={Sobolev Training for Neural Networks},
  author={Wojciech M. Czarnecki and Simon Osindero and Max Jaderberg and Grzegorz Swirszcz and Razvan Pascanu},
  booktitle={NIPS},
  year={2017}
}
At the heart of deep learning we aim to use neural networks as function approximators - training them to produce outputs from inputs in emulation of a ground truth function or data creation process. [] Key Method By optimising neural networks to not only approximate the function's outputs but also the function's derivatives we encode additional information about the target function within the parameters of the neural network. Thereby we can improve the quality of our predictors, as well as the data…

Figures and Tables from this paper

Sobolev Training for Physics Informed Neural Networks

Inspired by the recent studies that incorporate derivative information for the training of neural networks, a loss function is developed that guides a neural network to reduce the error in the corresponding Sobolev space, making the training substantially efficient.

Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives

This paper proposes a training paradigm for INRs whose target output is image pixels, to encode image derivatives in addition to image values in the neural network, and uses finite differences to approximate image derivatives.

Sobolev Training with Approximated Derivatives for Black-Box Function Regression with Neural Networks

This paper presents a training pipeline that enables Sobolev Training for regression problems where target derivatives are not directly available and proposes to use a least-squares estimate of the target derivatives based on function values of neighboring training samples.

JacNet: Learning Functions with Structured Jacobians

This work proposes to directly learn the Jacobian of the input-output function with a neural network, which allows easy control of derivative, and focuses on structuring the derivative to allow invertibility, and also demonstrates other useful priors can be enforced.

How degenerate is the parametrization of neural networks with the ReLU activation function?

The pathologies which prevent inverse stability in general are presented, and it is shown that by optimizing over such restricted sets, it is still possible to learn any function which can be learned by optimization over unrestricted sets.

Global Convergence of Sobolev Training for Overparametrized Neural Networks

This work proves that an overparameterized two-layer relu neural network trained on the Sobolev loss with gradient flow from random initialization can fit any given function values and any given directional derivatives, under a separation condition on the input data.

Gradient Regularization Improves Accuracy of Discriminative Models

It is demonstrated through experiments on real and synthetic tasks that stochastic gradient descent is unable to find locally optimal but globally unproductive solutions, and is forced to find solutions that generalize well.

Neuron Manifold Distillation for Edge Deep Learning

  • Zeyi TaoQi XiaQun Li
  • Computer Science
    2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS)
  • 2021
This work proposes a novel neuron manifold distillation (NMD), where the student models not only imitate teacher’s output activations, but also learn the feature geometry structure of the teacher.

Learning to solve the credit assignment problem

A hybrid learning approach that learns to approximate the gradient, and can match or the performance of exact gradient-based learning in both feedforward and convolutional networks.

Smooth Mathematical Function from Compact Neural Networks

This study gets NNs that generate highly accurate and highly smooth function, which only comprised of a few weight parameters, through discussing a few topics about regression.
...

References

SHOWING 1-10 OF 38 REFERENCES

Decoupled Neural Interfaces using Synthetic Gradients

It is demonstrated that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass -- amounting to independent networks which co-learn such that they can be composed into a single functioning corporation.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Distilling the Knowledge in a Neural Network

This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.

Higher Order Contractive Auto-Encoder

A novel regularizer when training an autoencoder for unsupervised feature extraction yields representations that are significantly better suited for initializing deep architectures than previously proposed approaches, beating state-of-the-art performance on a number of datasets.

Revisiting Natural Gradient for Deep Networks

It is described how one can use unlabeled data to improve the generalization error obtained by natural gradient and empirically evaluate the robustness of the algorithm to the ordering of the training set compared to stochastic gradient descent.

On learning the derivatives of an unknown mapping with multilayer feedforward networks

A Connection Between Score Matching and Denoising Autoencoders

A proper probabilistic model for the denoising autoencoder technique is defined, which makes it in principle possible to sample from them or rank examples by their energy, and a different way to apply score matching that is related to learning to denoise and does not require computing second derivatives is suggested.

Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network

A scheme is implemented that allows a network to learn the derivative of its outputs with respect to distortion operators of their choosing, which not only reduces the learning time and the amount of training data, but also provides a powerful language for specifying what generalizations the authors wish the network to perform.

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

Deep Model Compression: Distilling Knowledge from Noisy Teachers

This work extends the teacher-student framework for deep model compression to include a noise-based regularizer while training the student from the teacher, which provides a healthy provement in the performance of the student network.