Learn More
The increase in performance of the last generations of graphics processors (GPUs) has made this class of platform a coprocessing tool with remarkable success in certain types of operations. In this paper we evaluate the performance of the Level 3 operations in CUBLAS, the implementation of BLAS for NVIDIA R GPUs with unified architecture. From this study,(More)
Solving Dense Linear Systems on GPUs 1 Barrachina et al. The power and versatility of modern GPU have transformed them into the first widely extended HPC platform Solving Dense Linear Systems on GPUs 2 Barrachina et al. The solution of dense linear systems arises in a wide variety of fields How does the new generation of GPUs adapt to this type of problems?(More)
This paper analyzes the performance of two parallel algorithms for solving the linear-quadratic optimal control problem arising in discrete-time periodic linear systems. The algorithms perform a sequence of orthogonal reordering transformations on formal matrix products associated with the periodic linear system, and then employs the so-called matrix disk(More)
The last six years has seen Moore's Law continue to produce incredible gains in computational power. Indeed, the November, 2007 list of the top ten fastest supercomputers in the world contained no machines with acceleration of any kind. The same list six years later has four of the ten fastest supercomputers using accelerators, including the top two(More)
We present several algorithms to compute the solution of a linear system of equations on a GPU, as well as general techniques to improve their performance, such as padding and hybrid GPU-CPU computation. We compare single and double precision performance of a modern GPU with unified architecture, and show how iterative refinement with mixed precision can be(More)