• Corpus ID: 53093859

# Automatic differentiation in ML: Where we are and where we should be going

@inproceedings{Merrienboer2018AutomaticDI,
title={Automatic differentiation in ML: Where we are and where we should be going},
author={Bart van Merrienboer and Olivier Breuleux and Arnaud Bergeron and Pascal Lamblin},
booktitle={NeurIPS},
year={2018}
}
• Published in NeurIPS 1 October 2018
• Computer Science
We review the current state of automatic differentiation (AD) for array programming in machine learning (ML), including the different approaches such as operator overloading (OO) and source transformation (ST) used for AD, graph-based intermediate representations for programs, and source languages. Based on these insights, we introduce a new graph-based intermediate representation (IR) which specifically aims to efficiently support fully-general AD for array programming. Unlike existing…
43 Citations

## Figures from this paper

AutoGraph: Imperative-style Coding with Graph-based Performance
• Computer Science
MLSys
• 2019
This work describes how the use of staged programming in Python, via source code transformation, offers a midpoint between these two library design patterns, capturing the benefits of both machine learning and imperative programming.
Graph Tracking in Dynamic Probabilistic Programs via Source Transformations
• Computer Science
• 2019
Many machine learning methods acting on graph structures can be expressed in terms of message passing, among them variational methods for approximate Bayesian inference, automatic differentiation (AD), and backpropagation.
The 800 Pound Python in the Machine Learning Room
• Computer Science
• 2018
The ability to overcome shortcomings by performing a relatively simple source-tosource transformation, that allows for operator overloading techniques to be extended to language built-ins, including control flow operators, function definitions, etc is demonstrated.
Denotational Correctness of Foward-Mode Automatic Differentiation for Iteration and Recursion
A new notion of space is introduced, suitable for modeling both recursion and differentiation, by equipping a diffeological space with a compatible $\omega$cpo-structure, and it is demonstrated that the whole development extends to this setting.
• Computer Science
• 2019
This paper proposes a structured methodology to allow DSL developers to use the whole of Python as a front-end, rather than creating equivalent APIs or relying on shims, and implements it in a system called Snek, which represents the first type-driven multi-stage programming framework for a dynamic language which does not require extra-linguistic mechanisms.
Don't Unroll Adjoint: Differentiating SSA-Form Programs
This paper presents reverse-mode algorithmic differentiation based on source code transformation, in particular of the Static Single Assignment form used by modern compilers, and presents a new AD tool for the Julia language, called Zygote, which presents high-level dynamic semantics while transparently compiling adjoint code under the hood.
ALGORITHMIC DIFFERENTIATION
Zygote is designed to address the needs of both the machine learning and scientific computing communities, who have historically been siloed by their very different tools, and to enable differentiable programming (∂P ), in which arbitrary numerical programs can make use of gradient-based optimisation.
A Brief Introduction to Automatic Differentiation for Machine Learning
This report describes automatic differentiation, its motivations, and different implementation approaches, and briefly describes dataflow programming as it relates to AD.
Distillation of Weighted Automata from Recurrent Neural Networks using a Spectral Approach
• Computer Science
Machine Learning
• 2021
This paper provides an algorithm to extract a (stochastic) formal language from any recurrent neural network trained for language modelling and applies a spectral approach to infer a weighted automaton.
An Introduction to Automatic Differentiation for Machine Learning
This report describes automatic differentiation, its motivations, and different implementation approaches, and briefly describes dataflow programming as it relates to AD.

## References

SHOWING 1-10 OF 45 REFERENCES
DiffSharp: An AD Library for .NET Languages
• Computer Science
ArXiv
• 2016
DiffSharp is an algorithmic differentiation or automatic differentiation (AD) library for the .NET ecosystem, which is targeted by the C# and F# languages, among others. The library has been designed
Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator
• Computer Science
TOPL
• 2008
We show that reverse-mode AD (Automatic Differentiation)—a generalized gradient-calculation operator—can be incorporated as a first-class function in an augmented lambda calculus, and therefore into
Efficient Implementation of a Higher-Order Language with Built-In AD
• Computer Science
ArXiv
• 2016
We show that Automatic Differentiation (AD) operators can be provided in a dynamic language without sacrificing numeric performance. To achieve this, general forward and reverse AD functions are
A graph-based higher-order intermediate representation
• Computer Science
2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
• 2015
Thorin is presented: a higher-order, functional IR based on continuation-passing style that abandons explicit scope nesting in favor of a dependency graph that makes Thorin an attractive IR for both imperative as well as functional languages.
Forward-Mode Automatic Differentiation in Julia
• Computer Science
ArXiv
• 2016
ForwardDiff takes advantage of just-in-time (JIT) compilation to transparently recompile AD-unaware user code, enabling efficient support for higher-order differentiation and differentiation using custom number types.
TBR Analysis in Reverse-Mode Automatic Differentiation
• Computer Science
• 2003
The automatic generation of adjoints of mathematical models that are implemented as computer programs is receiving a increased attention in the scientific and engineering communities. Reverse-mode
Using Polyvariant Union-Free Flow Analysis to Compile aHigher-Order Functional-Programming Language with aFirst-Class Derivative Operator to Efficient Fortran-like Code
• Computer Science
• 2008
The compiler’s performance is competitive with FORTRAN-based systems on the authors' numerical examples, despite the potential inefficiencies entailed by support of a functional-programming language and a first-class AD operator.
Tangent: Automatic Differentiation Using Source Code Transformation in Python
• Computer Science
ArXiv
• 2017
Tangent is a new library that performs AD using source code transformation (SCT) in Python, and takes numeric functions written in a syntactic subset of Python and NumPy as input, and generates new Python functions which calculate a derivative.
Control-flow analysis of higher-order languages of taming lambda
This dissertation presents a technique for recovering the control-flow graph of a Scheme program at compile time, and gives examples of how this information can be used to perform several data-flow analysis optimisations, including copy propagation, induction- variable elimination, useless-variable elimination, and type recovery.
Theano: A Python framework for fast computation of mathematical expressions
• Computer Science
ArXiv
• 2016
The performance of Theano is compared against Torch7 and TensorFlow on several machine learning models and recently-introduced functionalities and improvements are discussed.