Corpus ID: 20099048

Auto-Differentiating Linear Algebra

  title={Auto-Differentiating Linear Algebra},
  author={Matthias W. Seeger and Asmus Hetzel and Zhenwen Dai and Neil D. Lawrence},
Development systems for deep learning, such as Theano, Torch, TensorFlow, or MXNet, are easy-to-use tools for creating complex neural network models. Since gradient computations are automatically baked in, and execution is mapped to high performance hardware, these models can be trained end-to-end on large amounts of data. However, it is currently not easy to implement many basic machine learning primitives in these systems (such as Gaussian processes, least squares estimation, principal… Expand
Differentiable Programming Tensor Networks
This work presents essential techniques to differentiate through the tensor networks contractions, including stable AD for tensor decomposition and efficient backpropagation through fixed point iterations, and removes laborious human efforts in deriving and implementing analytical gradients for Tensor network programs. Expand
A Simple and Efficient Tensor Calculus for Machine Learning
It is shown that using Ricci notation is not necessary for an efficient tensor calculus and an equally efficient method for the simpler Einstein notation is developed and turns out that turning to Einstein notation enables further improvements that lead to even better efficiency. Expand
Scalable Hyperparameter Transfer Learning
This work proposes a multi-task adaptive Bayesian linear regression model for transfer learning in BO, whose complexity is linear in the function evaluations: one Bayesianlinear regression model is associated to each black-box function optimization problem (or task), while transfer learning is achieved by coupling the models through a shared deep neural net. Expand
Computing Higher Order Derivatives of Matrix and Tensor Expressions
This work presents an algorithmic framework for computing matrix and tensor derivatives that extends seamlessly to higher order derivatives and shows a speedup between one and four orders of magnitude over state-of-the-art frameworks when evaluatingHigher order derivatives. Expand
Banded Matrix Operators for Gaussian Markov Models in the Automatic Differentiation Era
The aim of the paper is to make modern inference methods available for Gaussian models with banded precision available by equipping an automatic differentiation framework, such as TensorFlow or PyTorch, with some linear algebra operators dedicated to banded matrices. Expand
Use and implementation of autodifferentiation in tensor network methods with complex scalars
The feasibility of implementation of autodifferentiation in standard tensor network toolkits is commented on and the current status when the method is applied to cases where the underlying scalars are complex, not real and the final result is a real-valued scalar is summarised. Expand
Provably Correct Automatic Subdifferentiation for Qualified Programs
The main result shows that, under certain restrictions on the library of non-smooth functions, provably correct generalized sub-derivatives can be computed at a computational cost that is within a (dimension-free) factor of $6$ of the cost of computing the scalar function itself. Expand
Deep Factors for Forecasting
A hybrid model that incorporates the benefits of both classical and deep neural networks is proposed, which is data-driven and scalable via a latent, global, deep component, and handles uncertainty through a local classical model. Expand
QR and LQ Decomposition Matrix Backpropagation Algorithms for Square, Wide, and Deep Matrices and Their Software Implementation
This article presents matrix backpropagation algorithms for the QR decomposition of matrices $A_{m, n}$, that are either square (m = n), wide (m n), with rank $k = min(m, n)$. Furthermore, we deriveExpand
Differentiate Everything with a Reversible Programming Language
A reversible eDSL NiLang in Julia is developed that can differentiate a general program while being compatible with Julia’s ecosystem and demonstrates that a source-to-source AD framework can achieve the state-of-the-art performance. Expand


Deep Kernel Learning
We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputsExpand
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand
Deep Generative Stochastic Networks Trainable by Backprop
Theorems that generalize recent work on the probabilistic interpretation of denoising autoencoders are provided and obtain along the way an interesting justification for dependency networks and generalized pseudolikelihood. Expand
Generative Moment Matching Networks
This work forms a method that generates an independent sample via a single feedforward pass through a multilayer perceptron, as in the recently proposed generative adversarial networks, using MMD to learn to generate codes that can then be decoded to produce samples. Expand
Auto-Encoding Variational Bayes
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced. Expand
Variational Auto-encoded Deep Gaussian Processes
A new formulation of the variational lower bound is derived that allows for most of the computation to be distributed in a way that enables to handle datasets of the size of mainstream deep learning tasks. Expand
Machine learning - a probabilistic perspective
  • K. Murphy
  • Computer Science
  • Adaptive computation and machine learning series
  • 2012
This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students. Expand
Gaussian Processes for Machine Learning
The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification. Expand
Stochastic Back-propagation and Variational Inference in Deep Latent Gaussian Models
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference andExpand
Generative Adversarial Networks
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and aExpand