#### Filter Results:

- Full text PDF available (40)

#### Publication Year

2005

2017

- This year (2)
- Last 5 years (21)
- Last 10 years (36)

#### Publication Type

#### Co-author

#### Journals and Conferences

Learn More

- Stanimire Tomov, Jack J. Dongarra, Marc Baboulin
- Parallel Computing
- 2010

0167-8191/$ see front matter 2010 Elsevier B.V doi:10.1016/j.parco.2009.12.005 * Corresponding author. Tel.: +1 865 974 8295; fa E-mail addresses: tomov@eecs.utk.edu (S. Tomov We highlight the trends leading to the increased appeal of using hybrid multicore + GPU systems for high performance computing. We present a set of techniques that can be used toâ€¦ (More)

- Marc Baboulin, Alfredo Buttari, +5 authors Stanimire Tomov
- Computer Physics Communications
- 2009

a Department of Mathematics, University of Coimbra, Coimbra, Portugal b French National Institute for Research in Computer Science and Control, Lyon, France c Department of Electrical Engineering and Computer Science, University Tennessee, Knoxville, TN, USA d Oak Ridge National Laboratory, Oak Ridge, TN, USA e University of Manchester, Manchester, Unitedâ€¦ (More)

- Ahmad Abdelfattah, Marc Baboulin, +8 authors Stanimire Tomov
- ICCS
- 2016

We present a computational framework for high-performance tensor contractions on GPUs. High-performance is difficult to obtain using existing libraries, especially for many independent contractions where each contraction is very small, e.g., sub-vector/warp in size. However, using our framework to batch contractions plus application-specifics, weâ€¦ (More)

- Marc Baboulin, Serge Gratton
- SIAM J. Matrix Analysis Applications
- 2011

We derive closed formulas for the condition number of a linear function of the total least squares solution. Given an over determined linear system Ax = b, we show that this condition number can be computed using the singular values and the right singular vectors of [A, b] and A. We also provide an upper bound that requires the computation of the largestâ€¦ (More)

- Marc Baboulin, Jack J. Dongarra, Julien Herrmann, Stanimire Tomov
- ACM Trans. Math. Softw.
- 2013

We illustrate how linear algebra calculations can be enhanced by statistical techniques in the case of a square linear system <i>Ax</i> = <i>b</i>. We study a random transformation of <i>A</i> that enables us to avoid pivoting and then to reduce the amount of communication. Numerical experiments show that this randomization can be performed atâ€¦ (More)

We address some key issues in designing dense linear algebra (DLA) algorithms that are common for both multi/many-cores and special purpose architectures (in particular GPUs). We present them in the context of an LU factorization algorithm, where randomization techniques are used as an alternative to pivoting. This approach yields an algorithm basedâ€¦ (More)

We study several solvers for the solution of general linear systems where the main objective is to reduce the communication overhead due to pivoting. We first describe two existing algorithms for the LU factorization on hybrid CPU/GPU architectures. The first one is based on partial pivoting and the second uses a random preconditioning of the originalâ€¦ (More)

- Marc Baboulin, Luc Giraud, Serge Gratton
- IJHPCA
- 2005

In this paper we describe the parallel distributed implementation of a linear solver for large-scale applications involving real symmetric positive definite or complex symmetric non-Hermitian dense systems. The advantage of this routine is that it performs a Cholesky factorization by requiring half the storage needed by the standard parallel librariesâ€¦ (More)

- Grigori Fursin, Renato Miceli, +6 authors Davide Del Vento
- Scientific Programming
- 2014

Empirical auto-tuning and machine learning techniques have been showing high potential to improve execution time, power consumption, code size, reliability and other important metrics of various applications for more than two decades. However, they are still far from widespread production use due to lack of native support for auto-tuning in an ever changingâ€¦ (More)

- Marc Baboulin, Luc Giraud, Serge Gratton, Julien Langou
- Concurrency and Computation: Practice andâ€¦
- 2007

We propose in this paper a distributed packed storage format that exploits the symmetry or the triangular structure of a dense matrix. This format stores only half of the matrix while maintaining most of the efficiency compared to a full storage for a wide range of operations. This work has been motivated by the fact that, contrary to sequential linearâ€¦ (More)