#### Filter Results:

- Full text PDF available (148)

#### Publication Year

2000

2017

- This year (15)
- Last 5 years (101)
- Last 10 years (141)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- Emmanuel Agullo, Jim Demmel, +6 authors Stanimire Tomov
- 2009

The emergence and continuing use of multi-core architectures and graphics processing units require changes in the existing software and sometimes even a redesign of the established algorithms in order to take advantage of now prevailing parallelism. Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) and Matrix Algebra on GPU and Multics… (More)

- Stanimire Tomov, Jack J. Dongarra, Marc Baboulin
- Parallel Computing
- 2010

0167-8191/$ see front matter 2010 Elsevier B.V doi:10.1016/j.parco.2009.12.005 * Corresponding author. Tel.: +1 865 974 8295; fa E-mail addresses: tomov@eecs.utk.edu (S. Tomov We highlight the trends leading to the increased appeal of using hybrid multicore + GPU systems for high performance computing. We present a set of techniques that can be used to… (More)

- Emmanuel Agullo, Cédric Augonnet, +4 authors Stanimire Tomov
- 2011 IEEE International Parallel & Distributed…
- 2011

One of the major trends in the design of exascale architectures is the use of multicore nodes enhanced with GPU accelerators. Exploiting all resources of a hybrid accelerators-based node at their maximum potential is thus a fundamental step towards exascale computing. In this article, we present the design of a highly efficient QR factorization for such a… (More)

- Jakub Kurzak, Stanimire Tomov, Jack J. Dongarra
- IEEE Transactions on Parallel and Distributed…
- 2012

In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arithmetic and memory protected with error correction codes. Being the… (More)

We present an improved matrix-matrix multiplication routine (GEMM) in the MAGMA BLAS library that targets the Fermi GPUs. We show how to modify the previous MAGMA GEMM kernels in order to make a more efficient use of the Fermi’s new architectural features, most notably their extended memory hierarchy and sizes. The improved kernels run at up to 300 GFlop/s… (More)

- Stanimire Tomov, Rajib Nath, Hatem Ltaief, Jack J. Dongarra
- 2010 IEEE International Symposium on Parallel…
- 2010

Solving dense linear systems of equations is a fundamental problem in scientific computing. Numerical simulations involving complex systems represented in terms of unknown variables and relations between them often lead to linear systems of equations that must be solved as fast as possible. We describe current efforts toward the development of these… (More)

- Peng Du, Rick Weber, Piotr Luszczek, Stanimire Tomov, Gregory D. Peterson, Jack J. Dongarra
- Parallel Computing
- 2012

In this work, we evaluate OpenCL as a programming tool for developing performanceportable applications for GPGPU. While the Khronos group developed OpenCL with programming portability in mind, performance is not necessarily portable. OpenCL has required performance-impacting initializations that do not exist in other languages such as CUDA. Understanding… (More)

- Yinan Li, Jack J. Dongarra, Stanimire Tomov
- ICCS
- 2009

The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is especially true for Graphics Processing Units (GPUs), as evidenced by recently published results on DLA for GPUs that rely on highly optimized GEMM. However, the current best GEMM… (More)

- Emmanuel Agullo, Cédric Augonnet, +4 authors Stanimire Tomov
- 2011 9th IEEE/ACS International Conference on…
- 2011

Multicore architectures enhanced with multiple GPUs are likely to become mainstream High Performance Computing (HPC) platforms in a near future. In this paper, we present the design and implementation of an LU factorization using tile algorithm that can fully exploit the potential of such platforms in spite of their complexity. We use a methodology derived… (More)

- Rajib Nath, Stanimire Tomov, Jack J. Dongarra
- IJHPCA
- 2010

We present an improved matrix–matrix multiplication routine (General Matrix Multiply [GEMM]) in the MAGMA BLAS library that targets the NVIDIA Fermi graphics processing units (GPUs) using Compute Unified Data Architecture (CUDA). We show how to modify the previous MAGMA GEMM kernels in order to make a more efficient use of the Fermi’s new architectural… (More)