# The LINPACK Benchmark: past, present and future

@article{Dongarra2003TheLB, title={The LINPACK Benchmark: past, present and future}, author={Jack J. Dongarra and Piotr Luszczek and Antoine Petitet}, journal={Concurrency and Computation: Practice and Experience}, year={2003}, volume={15} }

This paper describes the LINPACK Benchmark and some of its variations commonly used to assess the performance of computer systems. Aside from the LINPACK Benchmark suite, the TOP500 and the HPL codes are presented. The latter is frequently used to obtained results for TOP500 submissions. Information is also given on how to interpret the results of the benchmark and how the results fit into the performance evaluation process. Copyright © 2003 John Wiley & Sons, Ltd.

## 787 Citations

### Complex version of high performance computing LINPACK benchmark (HPL)

- Computer ScienceConcurr. Comput. Pract. Exp.
- 2010

The results show that the modified HPL software brings a significant increase in the performance of the solver when simulating the highest resolution experiments thus far configured, achieving 87.5 TFLOPS on over 20 000 processors on the Cray XT4.

### A Few of the Most Popular Tools for Evaluating Supercomputers

- Computer Science2018 17th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES)
- 2018

The purpose, significance and method of benchmarking supercomputers, the state of the art, and a few of the mainstream benchmarks for supercomputer evaluation are discussed.

### Implementation of the Mixed-Precision High Performance LINPACK Benchmark on the CELL Processor

- Computer Science
- 2006

This paper describes in detail the implementation of code to solve linear system of equations using Gaussian elimination in single precision with iterative refinement of the solution to the full double precision accuracy.

### Benchmarking and Analysis of High Productibility Computing (HPCS)

- Computer Science
- 2006

The overall objective of this effort was to survey a number of DoD related applications in an effort to ascertain their needs with respect to determining what metrics exist, what metrics need to be developed.

### Measuring the performance of parallel computers with distributed memory

- Computer Science
- 2009

The results obtained via the de-facto standard LINPACK benchmark suite are shown to be weakly related to the efficiency of applied parallel programs and models and methods proposed by V. M. Glushkov in the late 1970s become topical again.

### The LAPACK for clusters project: an example of self adapting numerical software

- Computer Science37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the
- 2004

The context, design, and recent development of the LAPACK for clusters (LFC) project, which has been developed in the framework of self-adapting numerical software (SANS), are described.

### Self-adapting software for numerical linear algebra and LAPACK for clusters

- Computer ScienceParallel Comput.
- 2003

### The LINPACK Benchmark on a Multi-Core Multi-FPGA System

- Computer Science
- 2008

Results show, when using small sets of data, one FPGA can provide a speedup of 1.94 over a high-end process or running the LINPACK Benchmark with Level 1 BLAS, however, there is still opportuni ty to do better, especially when scaling to larger systems.

### Self-Adapting Software for Numerical Linear Algebra Library Routines on Clusters

- Computer ScienceInternational Conference on Computational Science
- 2003

The context, design, and recent development of the LAPACK for Clusters (LFC) project is described, developed in the framework of Self-Adapting Numerical Software (SANS), which it is argued can deliver the convenience and ease of use of existing sequential environments bundled with the power and versatility of highly-tuned parallel codes that execute on clusters.

### MULTIOBJECTIVE OPTIMIZATION OF THE VARIABILITY OF THE HIGH-PERFORMANCE LINPACK SOLVER

- Computer Science
- 2020

It is shown that specific configurations of the solver can be used to control for variability at a small sacrifice in mean throughput, and it is identified configurations that result in a relatively highmean throughput, but also results in a high throughput variability.

## References

SHOWING 1-10 OF 106 REFERENCES

### LINPACK user's guide

- Computer Science
- 1980

The use of least-squares techniques for this and G. W. Stewart, LINPACK Users' Guide for Intel® Math Kernel Library 11.3 for Linux* OS are provided.

### Implementation of the BLAS level 3 and LINPACK Benchmark on the AP1000

- Computer Science
- 1992

An implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) library and the LINPACK Benchmark on the Fujitsu AP1000 is described, which enables a performance of 85-90% of the AP1000’s theoretical peak speed for the BLAS Level 3 procedures and up to 80% for theLINPACK benchmark.

### Performance of various computers using standard linear equations software

- Computer ScienceCARN
- 1990

This report compares the performance of different computer systems in solving dense systems of linear equations, ranging from a CRAY Y-MP to scientific workstations such as the Apollo and Sun to IBM PCs.

### High Performance Software on Intel Pentium Pro Processors or Micro-Ops to TeraFLOPS

- Computer ScienceACM/IEEE SC 1997 Conference (SC'97)
- 1997

A model into the efforts on obtaining the world's first TeraFLOP MP LINPACK run (on the Intel ASCI Option Red Supercomputer), based on Pentium Pro processor technology, and optimization strategies used to achieve high performance on scientific applications.

### Basic Linear Algebra Subprograms for Fortran Usage

- Computer ScienceTOMS
- 1979

A package of 38 low level subprograms for many of the basic operations of numerical linear algebra is presented, intended to be used with FORTRAN.

### Parallel implementation of BLAS: general techniques for Level 3 BLAS

- Computer ScienceConcurr. Pract. Exp.
- 1997

It is shown that the techniques used for the matrix-matrix multiplication naturally extend to all important level 3 BLAS and thus this approach becomes an enabling technology for efficient parallel implementation of these routines and libraries that use BLAS.

### Parallel LU Decomposition on a Transputer Network

- Computer ScienceShell Conference
- 1988

A general Cartesian data distribution scheme is presented which contains many of the existing distribution schemes as special cases and is used to prove optimality of load balance for the grid distribution.

### Experiments with Multicomputer LU-decomposition

- Computer ScienceConcurr. Pract. Exp.
- 1990

It is shown that LU‐Decomposition with some pivoting strategies is both faster and numerically more stable than LU‐decomposition without pivoting, and is equivalent to randomizing the data distribution.