Robust Memory-Aware Mappings for Parallel Multifrontal Factorizations
@article{Agullo2016RobustMM, title={Robust Memory-Aware Mappings for Parallel Multifrontal Factorizations}, author={Emmanuel Agullo and Patrick R. Amestoy and Alfredo Buttari and Abdou Guermouche and Jean-Yves L’Excellent and François-Henry Rouet}, journal={SIAM J. Sci. Comput.}, year={2016}, volume={38} }
We study the memory scalability of the parallel multifrontal factorization of sparse matrices. In particular, we are interested in controlling the active memory specific to the multifrontal factorization. We illustrate why commonly used mapping strategies (e.g., the proportional mapping) cannot provide a high memory efficiency, which means that they tend to let the memory usage of the factorization grow when the number of processes increases. We propose " memory-aware " algorithms that aim at…
23 Citations
Tacho: Memory-Scalable Task Parallel Sparse Cholesky Factorization
- Computer Science2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
- 2018
A memory-scalable, parallel, sparse multifrontal solver for solving symmetric postive-definite systems arising in scientific and engineering applications and an idea to respawn the tasks when certain conditions are not met is used.
Sparse Supernodal Solver Using Block Low-Rank Compression
- Computer Science2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
- 2017
Two approaches using a Block Low-Rank (BLR) compression technique to reduce the memory footprint and/or the time-to-solution of the sparse supernodal solver PASTIX are presented.
Task-based multifrontal QR solver for heterogeneous architectures. (Solveur multifrontal QR à base de tâches pour architectures hétérogènes)
- Computer Science
- 2015
This study investigates the design of task-based sparse direct solvers which constitute extremely irregular workloads, with tasks of different granularities and characteristics with variable memory consumption on top of runtime systems and presents a hierarchical strategy for data partitioning and a scheduling algorithm capable of handling the heterogeneity of resources.
Block Low-Rank multifrontal solvers: complexity, performance, and scalability. (Solveurs multifrontaux exploitant des blocs de rang faible: complexité, performance et parallélisme)
- Computer Science
- 2017
This thesis proves that BLR multifrontal solvers can achieve a low complexity, and investigates the problem of translating that low complexity in actual performance gains on modern architectures, and presents a multithreaded BLR factorization, and analyzes its performance in shared-memory multicore environments on a large set of real-life problems.
Scalability of parallel sparse direct solvers: methods, memory and performance
- Computer Science
- 2018
Methods are presented that make sparse direct solvers memory-scalable, that is, capable of taking advantage of parallelism without increasing the overall memory footprint, and how it is possible to use data sparsity to achieve an asymptotic reduction of the cost of such methods.
Combining sparse approximate factorizations with mixed precision iterative refinement
- Computer Science
- 2022
This work develops a new error analysis for LU- and GMRES-based iterative refinement under a general model of LU factorization that accounts for the approximation methods typically used by modern sparse solvers, such as low-rank approximations or relaxed pivoting strategies.
A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices
- Computer Science2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2018
A new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems using a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication.
Dynamic Memory-Aware Task-Tree Scheduling
- Computer Science2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2017
This paper revisits the execution of tree-shaped task graphs using multiple processors that share a bounded memory, and presents a novel heuristic solution that has a low complexity and is guaranteed to complete the tree within a given memory bound.
A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems
- Computer ScienceJ. Parallel Distributed Comput.
- 2019
Parallel Scheduling of Task Trees with Limited Memory
- Computer ScienceTOPC
- 2015
This article investigates the execution of tree-shaped task graphs using multiple processors by considering multiple processors, and designs a series of practical heuristics achieving different trade-offs between the minimization of peak memory usage and makespan.
References
SHOWING 1-10 OF 34 REFERENCES
Memory and performance issues in parallel multifrontal factorizations and triangular solutions with sparse right-hand sides. (Problèmes de mémoire et de performance de la factorisation multifrontale parallèle et de la résolution triangulaire à seconds membres creux)
- Computer Science
- 2012
A class of "memory-aware" mapping and scheduling algorithms that aim at maximizing performance while enforcing a user-given memory constraint and provide robust memory estimates before the factorization are proposed.
Constructing memory-minimizing schedules for multifrontal methods
- Computer ScienceTOMS
- 2006
This work proposes new schedules to allocate and process tasks that improve memory usage by allowing a more flexible task allocation together with a specific tree traversal and presents optimal algorithms for this new class of schedules.
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling
- Computer ScienceSIAM J. Matrix Anal. Appl.
- 2001
The main features and the tuning of the algorithms for the direct solution of sparse linear systems on distributed memory computers developed in the context of a long term European research project are analyzed and discussed.
Hybrid scheduling for the parallel solution of linear systems
- Computer ScienceParallel Comput.
- 2006
On the Out-Of-Core Factorization of Large Sparse Matrices. (Méthodes directes hors-mémoire (out-of-core) pour la résolution de systèmes linéaires creux de grande taille)
- Computer Science
- 2008
This thesis proposes and studies various out-of-core models that aim at limiting the overhead due to data transfers between memory and disks on uniprocessor machines and focuses on a particular factorization method, the multifrontal method, that it shows allows to solve large sparse linear systems efficiently.
Parallel Scheduling of Task Trees with Limited Memory
- Computer ScienceTOPC
- 2015
This article investigates the execution of tree-shaped task graphs using multiple processors by considering multiple processors, and designs a series of practical heuristics achieving different trade-offs between the minimization of peak memory usage and makespan.
Task Scheduling for Parallel Multifrontal Methods
- Computer ScienceEuro-Par
- 2007
A new scheduling algorithm for task graphs arising from parallel multifrontal methods for sparse linear systems is presented, based on the theorem proved by Prasanna and Musicus for tree-shaped task graphs, when all tasks exhibit the same degree of parallelism.
A Mapping Algorithm for Parallel Sparse Cholesky Factorization
- Computer ScienceSIAM J. Sci. Comput.
- 1993
A task-to-processor mapping algorithm is described for computing the parallel multifrontal Cholesky factorization of irregular sparse problems on distributed-memory multiprocessors that is nearly as efficient on a collection of problems with irregular sparsity structure as it is for the regular grid problems.
On the storage requirement in the out-of-core multifrontal method for sparse factorization
- Computer ScienceTOMS
- 1986
Two techniques are introduced to reduce the working storage requirement for the recent multifrontal method of Duff and Reid used in the sparse out-of-core factorization of symmetric matrices. For a…