Learn More
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD) is a technique for calculating derivatives of numeric functions expressed as computer programs efficiently and accurately, used in fields such as computational fluid dynamics , nuclear engineering, and atmospheric sciences. Despite(More)
Automatic differentiation—the mechanical transformation of numeric computer programs to calculate derivatives efficiently and accurately—dates to the origin of the computer age. Reverse mode automatic differentiation both antedates and generalizes the method of backwards propagation of errors used in machine learning. Despite this, practitioners in a(More)
In this paper we introduce DiffSharp, an automatic differentiation (AD) library designed with machine learning in mind. AD is a family of techniques that evaluate derivatives at machine precision with only a small constant factor of overhead, by systematically applying the chain rule of calculus at the elementary operator level. DiffSharp aims to make an(More)
We introduce a method for using deep neu-ral networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do " compilation of inference " because our method transforms(More)
We present the results of our analysis of publication venues for papers on automatic differentiation (AD), covering academic journals and conference proceedings. Our data are collected from the AD publications database maintained by the autodiff.org community website (http://www. autodiff.org/). The database is purpose-built for the AD field and is(More)
We introduce a general method for improving the convergence rate of gradient-based optimiz-ers that is easy to implement and works well in practice. We analyze the effectiveness of the method by applying it to stochastic gradient descent , stochastic gradient descent with Nesterov momentum, and Adam, showing that it improves upon these commonly used(More)
  • 1