Corpus ID: 2361467

Differentiation of the Cholesky decomposition

@article{Murray2016DifferentiationOT,
title={Differentiation of the Cholesky decomposition},
author={Iain Murray},
journal={ArXiv},
year={2016},
volume={abs/1602.07527}
}
• Iain Murray
• Published 24 February 2016
• Mathematics, Computer Science
• ArXiv
We review strategies for differentiating matrix-based computations, and derive symbolic and algorithmic update rules for differentiating expressions containing the Cholesky decomposition. We recommend new blocked' algorithms, based on differentiating the Cholesky algorithm DPOTRF in the LAPACK library, which uses Level 3' matrix-matrix operations from BLAS, and so is cache-friendly and easy to parallelize. For large matrices, the resulting algorithms are the fastest way to compute Cholesky…
32 Citations

Topics from this paper

Auto-Differentiating Linear Algebra
• Computer Science, Mathematics
ArXiv
• 2017
This work details how a number of matrix decompositions (Cholesky, LQ, symmetric eigen) can be implemented as differentiable operators in MXNet, running on CPU and GPU in single and double precision.
Stochastic Gradient Descent for Spectral Embedding with Implicit Orthogonality Constraint
• Computer Science, Mathematics
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
• 2019
The contribution consists of reformulating spectral embedding so that it can be solved via stochastic optimization, and replaces the orthogonality constraint with an orthogonalization matrix injected directly into the criterion.
GPU-based Parallel Computation Support for Stan
This paper details an extensible OpenCL framework that allows Stan to utilize heterogeneous compute devices. It includes GPU-optimized routines for the Cholesky decomposition, its derivative, other
What is the gradient of a scalar function of a symmetric matrix ?
• Mathematics, Computer Science
ArXiv
• 2019
It is demonstrated that the relation between the gradient of a real-valued function of a symmetric matrix and the definition of a \frechet derivative is incorrect, and it is proved that G_s = \mathrm{sym}(G)\$.
Banded Matrix Operators for Gaussian Markov Models in the Automatic Differentiation Era
• Computer Science, Mathematics
AISTATS
• 2019
The aim of the paper is to make modern inference methods available for Gaussian models with banded precision available by equipping an automatic differentiation framework, such as TensorFlow or PyTorch, with some linear algebra operators dedicated to banded matrices.
Total stochastic gradient algorithms and applications in reinforcement learning
This work shows how the total derivative rule leads to an intuitive visual framework for creating gradient estimator on graphical models, and derives new gradient estimators based on density estimation, as well as a likelihood ratio gradient.
GaussianProcesses.jl:A Nonparametric Bayes package for the Julia Language
• Computer Science, Mathematics
• 2018
A tutorial of the GaussianProcesses.jl package, a fast, flexible and user-friendly Gaussian processes package that has been developed for the Julia language, that makes efficient use of existing Julia packages to provide users with a range of optimization and plotting tools.
On the Computation of Complex-valued Gradients with Application to Statistically Optimum Beamforming
• Computer Science
ArXiv
• 2017
The computation of gradients by algorithmic differentiation for statistically optimum beamforming operations via the complex-valued chain rule and the derivative of the eigenvalue problem withcomplex-valued eigenvectors is one of the key results of this report.
Eigendecomposition-Free Training of Deep Networks for Linear Least-Square Problems
• Computer Science, Medicine
IEEE Transactions on Pattern Analysis and Machine Intelligence
• 2021
An eigendecomposition-free approach to training a deep network whose loss depends on the eigenvector corresponding to a zero eigenvalue of a matrix predicted by the network, which has better convergence properties and yields state-of-the-art results.
Eigendecomposition-free Training of Deep Networks with Zero Eigenvalue-based Losses
• Computer Science
ECCV
• 2018
This paper introduces an eigendecomposition-free approach to training a deep network whose loss depends on the eigenvector corresponding to a zero eigenvalue of a matrix predicted by the network.

References

SHOWING 1-10 OF 31 REFERENCES
Structured higher-order algorithmic differentiation in the forward and reverse mode with application in optimum experimental design
It is demonstrated that the novel algorithms can be used to compute the gradient of the optimum experimental design (OED) objective function in the reverse mode of AD, and how the necessary higherorder derivatives can be computed using Taylor polynomial arithmetic is discussed.
Adjoint Algorithmic Differentiation and the Derivative of the Cholesky Decomposition
Adjoint Algorithmic Differentiation is an efficient method to calculate the sensitivities of financial instruments to the input parameters. It does require an implementation of the derivative of
Differentiation of the Cholesky Algorithm
Abstract One way to estimate variance components is by restricted maximum likelihood. The log-likelihood function is fully defined by the Cholesky factor of a matrix that is usually large and sparse.
The Elimination Matrix: Some Lemmas and Applications
• Computer Science, Mathematics
SIAM J. Matrix Anal. Appl.
• 1980
Two transformation matrices are introduced, L and D, which contain zero and unit elements only, and are demonstrated in three areas of mathematical statistics and matrix algebra: maximum likelihood estimation of the multivariate normal distribution, the evaluation of Jacobians of transformations with symmetric or lower triangular matrix arguments, and the solution of matrix equations.
Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms
• Computer Science
TOMS
• 2013
Four routines called DPOTF3i, i = a,b,c,d, are presented. DPOTF3i are a novel type of level-3 BLAS for use by BPF (Blocked Packed Format) Cholesky factorization and LAPACK routine DPOTRF. Performance
An extended collection of matrix derivative results for forward and reverse mode algorithmic dieren tiation
This paper collects together a number of matrix derivative results which are very useful in forward and reverse mode algorithmic differentiation (AD). It highlights in particular the remarkable
Automatic differentiation in machine learning: a survey
• Mathematics, Computer Science
J. Mach. Learn. Res.
• 2017
By precisely defining the main differentiation techniques and their interrelationships, this work aims to bring clarity to the usage of the terms “autodiff’, “automatic differentiation”, and “symbolic differentiation" as these are encountered more and more in machine learning settings.
Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs
• Computer Science
TOMS
• 1988
This paper describes a model implementation and test software for the Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS). Level 2 BLAS are targeted at matrix-vector operations with the aim of
LAPACK Users' Guide
The third edition of LAPACK provided a guide to troubleshooting and installation of Routines, as well as providing examples of how to convert from LINPACK or EISPACK to BLAS.
Guide to NumPy
This is the second edition of Travis Oliphant's A Guide to NumPy, designed to be a reference that can be used by practitioners who are familiar with Python but want to learn more about NumPy and related tools.