The LINPACK Benchmark: past, present and future

@article{Dongarra2003TheLB,
  title={The LINPACK Benchmark: past, present and future},
  author={Jack J. Dongarra and Piotr Luszczek and Antoine Petitet},
  journal={Concurrency and Computation: Practice and Experience},
  year={2003},
  volume={15}
}
This paper describes the LINPACK Benchmark and some of its variations commonly used to assess the performance of computer systems. Aside from the LINPACK Benchmark suite, the TOP500 and the HPL codes are presented. The latter is frequently used to obtained results for TOP500 submissions. Information is also given on how to interpret the results of the benchmark and how the results fit into the performance evaluation process. Copyright © 2003 John Wiley & Sons, Ltd. 

Complex version of high performance computing LINPACK benchmark (HPL)

The results show that the modified HPL software brings a significant increase in the performance of the solver when simulating the highest resolution experiments thus far configured, achieving 87.5 TFLOPS on over 20 000 processors on the Cray XT4.

A Few of the Most Popular Tools for Evaluating Supercomputers

  • Gang XieYa-lin Zhang
  • Computer Science
    2018 17th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES)
  • 2018
The purpose, significance and method of benchmarking supercomputers, the state of the art, and a few of the mainstream benchmarks for supercomputer evaluation are discussed.

Implementation of the Mixed-Precision High Performance LINPACK Benchmark on the CELL Processor

This paper describes in detail the implementation of code to solve linear system of equations using Gaussian elimination in single precision with iterative refinement of the solution to the full double precision accuracy.

Benchmarking and Analysis of High Productibility Computing (HPCS)

The overall objective of this effort was to survey a number of DoD related applications in an effort to ascertain their needs with respect to determining what metrics exist, what metrics need to be developed.

Measuring the performance of parallel computers with distributed memory

The results obtained via the de-facto standard LINPACK benchmark suite are shown to be weakly related to the efficiency of applied parallel programs and models and methods proposed by V. M. Glushkov in the late 1970s become topical again.

The LAPACK for clusters project: an example of self adapting numerical software

The context, design, and recent development of the LAPACK for clusters (LFC) project, which has been developed in the framework of self-adapting numerical software (SANS), are described.

The LINPACK Benchmark on a Multi-Core Multi-FPGA System

Results show, when using small sets of data, one FPGA can provide a speedup of 1.94 over a high-end process or running the LINPACK Benchmark with Level 1 BLAS, however, there is still opportuni ty to do better, especially when scaling to larger systems.

Self-Adapting Software for Numerical Linear Algebra Library Routines on Clusters

The context, design, and recent development of the LAPACK for Clusters (LFC) project is described, developed in the framework of Self-Adapting Numerical Software (SANS), which it is argued can deliver the convenience and ease of use of existing sequential environments bundled with the power and versatility of highly-tuned parallel codes that execute on clusters.

MULTIOBJECTIVE OPTIMIZATION OF THE VARIABILITY OF THE HIGH-PERFORMANCE LINPACK SOLVER

It is shown that specific configurations of the solver can be used to control for variability at a small sacrifice in mean throughput, and it is identified configurations that result in a relatively highmean throughput, but also results in a high throughput variability.
...

References

SHOWING 1-10 OF 106 REFERENCES

LINPACK user's guide

The use of least-squares techniques for this and G. W. Stewart, LINPACK Users' Guide for Intel® Math Kernel Library 11.3 for Linux* OS are provided.

Implementation of the BLAS level 3 and LINPACK Benchmark on the AP1000

An implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) library and the LINPACK Benchmark on the Fujitsu AP1000 is described, which enables a performance of 85-90% of the AP1000’s theoretical peak speed for the BLAS Level 3 procedures and up to 80% for theLINPACK benchmark.

Performance of various computers using standard linear equations software

This report compares the performance of different computer systems in solving dense systems of linear equations, ranging from a CRAY Y-MP to scientific workstations such as the Apollo and Sun to IBM PCs.

High Performance Software on Intel Pentium Pro Processors or Micro-Ops to TeraFLOPS

A model into the efforts on obtaining the world's first TeraFLOP MP LINPACK run (on the Intel ASCI Option Red Supercomputer), based on Pentium Pro processor technology, and optimization strategies used to achieve high performance on scientific applications.

Parallel implementation of BLAS: general techniques for Level 3 BLAS

It is shown that the techniques used for the matrix-matrix multiplication naturally extend to all important level 3 BLAS and thus this approach becomes an enabling technology for efficient parallel implementation of these routines and libraries that use BLAS.

Sparse matrix calculations on the CRAY-2

The Multicomputer Toolbox approach to concurrent BLAS and LACS

There is limited leverage in LACS per se as a stand-alone message-passing standard, and it is proposed that needed capabilities instead be integrated in a general, application-level message passing standard, focusing attention on CBLAS and large-scale application needs.

Parallel LU Decomposition on a Transputer Network

A general Cartesian data distribution scheme is presented which contains many of the existing distribution schemes as special cases and is used to prove optimality of load balance for the grid distribution.

Experiments with Multicomputer LU-decomposition

It is shown that LU‐Decomposition with some pivoting strategies is both faster and numerically more stable than LU‐decomposition without pivoting, and is equivalent to randomizing the data distribution.
...