Robust Memory-Aware Mappings for Parallel Multifrontal Factorizations

@article{Agullo2016RobustMM,
  title={Robust Memory-Aware Mappings for Parallel Multifrontal Factorizations},
  author={Emmanuel Agullo and Patrick R. Amestoy and Alfredo Buttari and Abdou Guermouche and Jean-Yves L’Excellent and François-Henry Rouet},
  journal={SIAM J. Sci. Comput.},
  year={2016},
  volume={38}
}
We study the memory scalability of the parallel multifrontal factorization of sparse matrices. In particular, we are interested in controlling the active memory specific to the multifrontal factorization. We illustrate why commonly used mapping strategies (e.g., the proportional mapping) cannot provide a high memory efficiency, which means that they tend to let the memory usage of the factorization grow when the number of processes increases. We propose " memory-aware " algorithms that aim at… 
Tacho: Memory-Scalable Task Parallel Sparse Cholesky Factorization
TLDR
A memory-scalable, parallel, sparse multifrontal solver for solving symmetric postive-definite systems arising in scientific and engineering applications and an idea to respawn the tasks when certain conditions are not met is used.
Sparse Supernodal Solver Using Block Low-Rank Compression
TLDR
Two approaches using a Block Low-Rank (BLR) compression technique to reduce the memory footprint and/or the time-to-solution of the sparse supernodal solver PASTIX are presented.
Task-based multifrontal QR solver for heterogeneous architectures. (Solveur multifrontal QR à base de tâches pour architectures hétérogènes)
TLDR
This study investigates the design of task-based sparse direct solvers which constitute extremely irregular workloads, with tasks of different granularities and characteristics with variable memory consumption on top of runtime systems and presents a hierarchical strategy for data partitioning and a scheduling algorithm capable of handling the heterogeneity of resources.
Block Low-Rank multifrontal solvers: complexity, performance, and scalability. (Solveurs multifrontaux exploitant des blocs de rang faible: complexité, performance et parallélisme)
TLDR
This thesis proves that BLR multifrontal solvers can achieve a low complexity, and investigates the problem of translating that low complexity in actual performance gains on modern architectures, and presents a multithreaded BLR factorization, and analyzes its performance in shared-memory multicore environments on a large set of real-life problems.
Scalability of parallel sparse direct solvers: methods, memory and performance
TLDR
Methods are presented that make sparse direct solvers memory-scalable, that is, capable of taking advantage of parallelism without increasing the overall memory footprint, and how it is possible to use data sparsity to achieve an asymptotic reduction of the cost of such methods.
Combining sparse approximate factorizations with mixed precision iterative refinement
TLDR
This work develops a new error analysis for LU- and GMRES-based iterative refinement under a general model of LU factorization that accounts for the approximation methods typically used by modern sparse solvers, such as low-rank approximations or relaxed pivoting strategies.
A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices
TLDR
A new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems using a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication.
Dynamic Memory-Aware Task-Tree Scheduling
TLDR
This paper revisits the execution of tree-shaped task graphs using multiple processors that share a bounded memory, and presents a novel heuristic solution that has a low complexity and is guaranteed to complete the tree within a given memory bound.
Parallel Scheduling of Task Trees with Limited Memory
TLDR
This article investigates the execution of tree-shaped task graphs using multiple processors by considering multiple processors, and designs a series of practical heuristics achieving different trade-offs between the minimization of peak memory usage and makespan.
...
...

References

SHOWING 1-10 OF 34 REFERENCES
Memory and performance issues in parallel multifrontal factorizations and triangular solutions with sparse right-hand sides. (Problèmes de mémoire et de performance de la factorisation multifrontale parallèle et de la résolution triangulaire à seconds membres creux)
TLDR
A class of "memory-aware" mapping and scheduling algorithms that aim at maximizing performance while enforcing a user-given memory constraint and provide robust memory estimates before the factorization are proposed.
Constructing memory-minimizing schedules for multifrontal methods
TLDR
This work proposes new schedules to allocate and process tasks that improve memory usage by allowing a more flexible task allocation together with a specific tree traversal and presents optimal algorithms for this new class of schedules.
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling
TLDR
The main features and the tuning of the algorithms for the direct solution of sparse linear systems on distributed memory computers developed in the context of a long term European research project are analyzed and discussed.
Impact of reordering on the memory of a multifrontal solver
On the Out-Of-Core Factorization of Large Sparse Matrices. (Méthodes directes hors-mémoire (out-of-core) pour la résolution de systèmes linéaires creux de grande taille)
TLDR
This thesis proposes and studies various out-of-core models that aim at limiting the overhead due to data transfers between memory and disks on uniprocessor machines and focuses on a particular factorization method, the multifrontal method, that it shows allows to solve large sparse linear systems efficiently.
Parallel Scheduling of Task Trees with Limited Memory
TLDR
This article investigates the execution of tree-shaped task graphs using multiple processors by considering multiple processors, and designs a series of practical heuristics achieving different trade-offs between the minimization of peak memory usage and makespan.
Task Scheduling for Parallel Multifrontal Methods
TLDR
A new scheduling algorithm for task graphs arising from parallel multifrontal methods for sparse linear systems is presented, based on the theorem proved by Prasanna and Musicus for tree-shaped task graphs, when all tasks exhibit the same degree of parallelism.
A Mapping Algorithm for Parallel Sparse Cholesky Factorization
TLDR
A task-to-processor mapping algorithm is described for computing the parallel multifrontal Cholesky factorization of irregular sparse problems on distributed-memory multiprocessors that is nearly as efficient on a collection of problems with irregular sparsity structure as it is for the regular grid problems.
On the storage requirement in the out-of-core multifrontal method for sparse factorization
Two techniques are introduced to reduce the working storage requirement for the recent multifrontal method of Duff and Reid used in the sparse out-of-core factorization of symmetric matrices. For a
...
...