Corpus ID: 3766791

Automatic differentiation in machine learning: a survey

@article{Baydin2017AutomaticDI,
  title={Automatic differentiation in machine learning: a survey},
  author={Atilim Gunes Baydin and Barak A. Pearlmutter and Alexey Radul and J. Siskind},
  journal={J. Mach. Learn. Res.},
  year={2017},
  volume={18},
  pages={153:1-153:43}
}
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply “auto-diff”, is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established field with applications in areas including computational fluid dynamics, atmospheric sciences… Expand
An introduction to algorithmic differentiation
TLDR
This work provides an introduction to AD and presents its basic ideas and techniques, some of its most important results, the implementation paradigms it relies on, the connection it has to other domains including machine learning and parallel computing, and a few of the major open problems in the area. Expand
DiffSharp: Automatic Differentiation Library
TLDR
DiffSharp aims to make an extensive array of AD techniques available, in convenient form, to the machine learning community, including arbitrary nesting of forward/reverse AD operations, AD with linear algebra primitives, and a functional API that emphasizes the use of higher-order functions and composition. Expand
A benchmark of selected algorithmic differentiation tools on some problems in computer vision and machine learning
TLDR
15 ways of computing derivatives including 11 automatic differentiation tools implementing various methods and written in various languages (C++, F#, MATLAB, Julia and Python), 2 symbolic differentiation tools, finite differences and hand-derived computation are compared. Expand
A review of automatic differentiation and its efficient implementation
  • C. Margossian
  • Computer Science, Mathematics
  • Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
  • 2019
TLDR
Automatic differentiation is a powerful tool to automate the calculation of derivatives and is preferable to more traditional methods, especially when differentiating complex algorithms and mathematical functions. Expand
BackPACK: Packing more into backprop
TLDR
BackPACK is introduced, an efficient framework built on top of PyTorch that extends the backpropagation algorithm to extract additional information from first-and second-order derivatives to address the problem of automatic differentiation frameworks not supporting other quantities such as the variance of the mini-batch gradients. Expand
Learning Hidden Dynamics using Intelligent Automatic Differentiation
TLDR
Numerical tests demonstrate the feasibility of IAD for learning hidden dynamics in complicated systems of PDEs; additionally, by incorporating custom built state adjoint method codes in IAD, this work significantly accelerate the forward and inverse simulation. Expand
Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning
TLDR
This work introduces generalized derivatives called conservative fields for which a calculus is developed and provides variational formulas for nonsmooth automatic differentiation oracles, as for instance the famous backpropagation algorithm in deep learning. Expand
On the Iteration Complexity of Hypergradient Computation
TLDR
A unified analysis is presented which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity, and suggests a hierarchy in terms of computational efficiency among the above methods. Expand
Vector Forward Mode Automatic Differentiation on SIMD/SIMT architectures
TLDR
It is demonstrated that the forward mode can outperform the reverse mode for programs with tens or hundreds of directional derivatives, a number that may yet increase if current hardware trends continue. Expand
A Differentiable Programming System to Bridge Machine Learning and Scientific Computing
TLDR
Zygote is described, a Differentiable Programming system that is able to take gradients of general program structures and supports almost all language constructs and compiles high-performance code without requiring any user intervention or refactoring to stage computations. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 276 REFERENCES
Automatic differentiation for computational finance
TLDR
The basic principles of AD and some available tools implementing this technology are reviewed and the so-called reverse mode of AD can compute gradients of scalar-valued functions with either divided differences or symbolic differentiation at constant cost. Expand
Automatic differentiation in geophysical inverse problems
SUMMARY Automatic differentiation (AD) is the technique whereby output variables of a computer code evaluating any complicated function (e.g. the solution to a differential equation) can beExpand
Modeling, Inference and Optimization With Composable Differentiable Procedures
TLDR
It is shown that early stopping and ensembling, popular tricks for avoiding overfitting, can be interpreted as variational Bayesian inference, and how to compute gradients of cross-validation loss with respect to hyperparameters of learning algorithms, with both time and memory efficiency. Expand
Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition
TLDR
This second edition has been updated and expanded to cover recent developments in applications and theory, including an elegant NP completeness argument by Uwe Naumann and a brief introduction to scarcity, a generalization of sparsity. Expand
On the numerical stability of algorithmic differentiation
TLDR
If the function is defined by an evaluation procedure as a composition of arithmetic operations and elementary functions, then automatic, or algorithmic differentiation is backward stable in the sense of Wilkinson, and the derivative values obtained are exact for a perturbation of the elementary components at the level of the machine precision. Expand
Forward-Mode Automatic Differentiation in Julia
TLDR
ForwardDiff takes advantage of just-in-time (JIT) compilation to transparently recompile AD-unaware user code, enabling efficient support for higher-order differentiation and differentiation using custom number types. Expand
Algorithmic Differentiation: Application to Variational Problems in Computer Vision
TLDR
This paper discretize the energy functional and, subsequently, applies the mathematical concept of algorithmic differentiation to directly derive algorithms that implement the energyfunctional's derivatives. Expand
Optimization Methods for Large-Scale Machine Learning
TLDR
A major theme of this study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter, leading to a discussion about the next generation of optimization methods for large- scale machine learning. Expand
Gaussian Processes for Machine Learning
TLDR
The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification. Expand
Tricks from Deep Learning
TLDR
A way to dramatically reduce the size of the tape when performing reverse-mode AD on a (theoretically) time-reversible process like an ODE integrator; and a new mathematical insight that allows for the implementation of a stochastic Newton's method are discussed. Expand
...
1
2
3
4
5
...