• Corpus ID: 207984835

Implementing Matrix Inversions

  title={Implementing Matrix Inversions},
  author={Jakub Kurzak and Mark Gates and Ali Charara and Asim YarKhan and Jack J. Dongarra},

Figures from this paper


A Note On Parallel Matrix Inversion
We present one-sweep parallel algorithms for the inversion of general and symmetric positive definite matrices. The algorithms feature simple programming and performance optimization while
A class of compatible cache consistency protocols and their support by the IEEE futurebus
This paper defines a class of compatible consistency protocols supported by the current IEEE Futurebus design, referred to as the MOESI class of protocols, which has the property that any system component can select (dynamically) any action permitted by any protocol in the class, and be assured that consistency is maintained throughout the system.
Stability of methods for matrix inversion
Inversion of a triangular matrix can be accomplished in several ways. The standard methods are characterized by the loop ordering, whether matrix-vector multiplication, solution of a triangular
Families of algorithms related to the inversion of a Symmetric Positive Definite matrix
This work states different algorithms for each of these sweeps of the inversion of a Symmetric Positive Definite matrix as well as algorithms that compute the result in a single sweep and outperforms the current ScaLAPACK implementation by 20-30 percent due to improved load-balance on a distributed memory architecture.
A Primer on Memory Consistency and Cache Coherence
This primer is to provide readers with a basic understanding of consistency and coherence, and presents both highlevel concepts as well as specific, concrete examples from real-world systems.
High performance matrix inversion based on LU factorization for multicore architectures
The reported results from the LU-based matrix inversion implementation significantly outperform the state-of-the-art numerical libraries such as LAPACK, MKL and ScaLAPACK on a contemporary AMD platform with four sockets and the total of 48 cores for a matrix of size 24000.
A Critical Path Approach to Analyzing Parallelism of Algorithmic Variants. Application to Cholesky Inversion
This work derives and presents the critical path lengths of each algorithmic variant for the authors' application problem which enables us to determine a lower bound on the time to solution of each algorithm variant.
Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures
This extended abstract revisits the computation of the inverse of a symmetric positive definite matrix and demonstrates that, for some variants, non trivial compiler techniques need then to be applied to further increase the parallelism of the application.