The generalized matrix chain algorithm

@article{Barthels2018TheGM,
  title={The generalized matrix chain algorithm},
  author={Henrik Barthels and Marcin Copik and Paolo Bientinesi},
  journal={Proceedings of the 2018 International Symposium on Code Generation and Optimization},
  year={2018}
}
In this paper, we present a generalized version of the matrix chain algorithm to generate efficient code for linear algebra problems, a task for which human experts often invest days or even weeks of works. The standard matrix chain problem consists in finding the parenthesization of a matrix product M := A1 A2 ⋯ An that minimizes the number of scalar operations. In practical applications, however, one frequently encounters more complicated expressions, involving transposition, inversion, and… 

Figures and Tables from this paper

Linnea: Automatic Generation of Efficient Linear Algebra Programs
TLDR
Linnea is a code generator for linear algebra problems that uses a custom best-first search algorithm to find a first solution in less than a second, and increasingly better solutions when given more time.
The Linear Algebra Mapping Problem
TLDR
The aim of this study is to give concrete guidelines for the development of languages and libraries that support linear algebra computations by investigating how effectively a benchmark of test problems is solved by popular high-level programming languages.
The Linear Algebra Mapping Problem. Current state of linear algebra languages and libraries.
TLDR
The problem of mapping a linear algebra expression to a set of available building blocks as the “Linear Algebra Mapping Problem” (LAMP) is defined, its NP-complete nature is discussed, and how effectively a benchmark of test problems is solved by popular high-level programming languages and libraries is investigated.
gemm 3 : Constant-Workspace High-Performance Multiplication of Three Matrices for Matrix Chaining
TLDR
This work derives an algorithm for gemm3, which can multiply three matrices using only a constant amount of additional memory, and presents experimental results which show that the algorithm retains performance comparable to that of current methods.
Optimal sequence for chain matrix multiplication using evolutionary algorithm
TLDR
A new model to minimize the Chain Matrix Multiplication operations based on group counseling optimizer (GCO) is proposed, which provides good performance and reduces the multiplication operations varying from 45% to 96% when compared with sequential multiplication.
Automatic Generation of Efficient Linear Algebra Programs
TLDR
Linnea is developing Linnea, a code generator for linear algebra problems that takes a high-level description of a linear algebra problem and produces as output an efficient sequence of calls to high-performance kernels.
Memory Safe Computations with XLA Compiler
TLDR
An XLA compiler extension 1 is developed that adjusts the computational data-flow representation of an algorithm according to a user-specified memory limit and shows that k-nearest neighbour and sparse Gaussian process regression methods can be run at a much larger scale on a single device, where standard implementations would have failed.
NumLin: Linear Types for Linear Algebra
TLDR
It is demonstrated that linear types are well-suited to expressing the APIs of low-level linear algebra libraries accurately and concisely and that, despite the complexity of prior work on it, fractional permissions can actually be implemented using simple, well-known techniques and be used practically in real programs.

References

SHOWING 1-10 OF 54 REFERENCES
Application-tailored linear algebra algorithms
TLDR
A knowledge-aware linear algebra compiler is introduced that allows users to input matrix equations together with properties about the operands and the problem itself; for instance, they can specify that the equation is part of a sequence, and how successive instances are related to one another.
A basic linear algebra compiler for structured matrices
TLDR
This paper provides a compiler that translates a given basic linear algebra computation on structured matrices into optimized C code, optionally vectorized with intrinsics, and is extensible to a much larger set including blocked structures.
Very Fast Approximation of the Matrix Chain Product Problem
TLDR
A very fast parallel algorithm for approximately solving the matrix chain product problem and for the problem for finding a near-optimal triangulation of a convex polygon that produces solutions that are at mostformula(?0.1547) times the optimal solutions.
Application-tailored Linear Algebra Algorithms: A search-based Approach
TLDR
A knowledge-aware linear algebra compiler is introduced that allows users to input matrix equations together with properties about the operands and the problem itself; for instance, they can specify that the equation is part of a sequence, and how successive instances are related to one another.
Computation of Matrix Chain Products. Part II
TLDR
This paper considers the computation of matrix chain products of the form M_1 \times M_2 \times \cdots M_{n - 1} and presents some theorems about an optimum order of computing the matrices.
Knowledge-Based Automatic Generation of Partitioned Matrix Expressions
TLDR
The steps leading to a PME and the knowledge necessary for a symbolic system to perform such steps are discussed and a prototype system written in Mathematica that generates PMEs automatically is introduced, called CLICK.
The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form
TLDR
There is a tradeoff between efficiency and software engineering considerations, such as ease of programming and simplicity of code, in the design of ScaLAPACK, a software library for performing dense linear algebra computations on distributed memory concurrent computers.
Automatic Generation of Loop-Invariants for Matrix Operations
TLDR
CL1ck is presented, a symbolic system written in Mathematica that starts with an equation, decomposes it into multiple equations, and returns a set of loop-invariants for the algorithms -- yet to be generated -- that will solve the equation.
Accelerating the Dynamic Programming for the Matrix Chain Product on the GPU
TLDR
The main contribution of this paper is to present an efficient parallel implementation of this Matrix Chain Product Problem optimization problem for finding parentheses of the matrix chain that gives the minimum total number of multiplications necessary to compute the product of the matrices chain on the GPU.
Families of algorithms related to the inversion of a Symmetric Positive Definite matrix
TLDR
This work states different algorithms for each of these sweeps of the inversion of a Symmetric Positive Definite matrix as well as algorithms that compute the result in a single sweep and outperforms the current ScaLAPACK implementation by 20-30 percent due to improved load-balance on a distributed memory architecture.
...
...