#### Filter Results:

#### Publication Year

2009

2016

#### Publication Type

#### Co-author

#### Key Phrase

#### Publication Venue

Learn More

We introduce a new collection of solvers – subsequently called EleMRRR – for large-scale dense Hermitian eigenproblems. EleMRRR solves various types of problems: generalized, standard, and tridiagonal eigenproblems. Among these, the last is of particular importance as it is a solver on its own right, as well as the computational kernel for the first two; we… (More)

—It is well known that the behavior of dense linear algebra algorithms is greatly influenced by factors like target architecture, underlying libraries and even problem size; because of this, the accurate prediction of their performance is a real challenge. In this article, we are not interested in creating accurate models for a given algorithm, but in… (More)

Tensor operations are surging as the computational building blocks for a variety of scientific simulations and the development of high-performance kernels for such operations is known to be a challenging task. While for operations on one-and two-dimensional tensors there exist standardized interfaces and highly-optimized libraries (BLAS), for higher… (More)

- Elmar Peise
- ArXiv
- 2012

I hereby declare that this thesis is entirely the result of my own work except where otherwise indicated. I have only used the resources given in the list of references.

It is universally known that caching is critical to attain high-performance implementations: In many situations, data locality (in space and time) plays a bigger role than optimizing the (number of) arithmetic floating point operations. In this paper, we show evidence that at least for linear algebra algorithms, caching is also a crucial factor for accurate… (More)

In order to associate complex traits with genetic polymorphisms, genome-wide association studies process huge datasets involving tens of thousands of individuals genotyped for millions of polymorphisms. When handling these datasets, which exceed the main memory of contemporary computers, one faces two distinct challenges: 1) Millions of polymorphisms come… (More)

One of the greatest effort of computational scientists is to translate the mathematical model describing a class of physical phenomena into large and complex codes. Many of these codes face the difficulty of implementing the mathematical operations in the model in terms of low level optimized kernels offering both performance and portability. Legacy codes… (More)

To exploit both memory locality and the full performance potential of highly tuned kernels, dense linear algebra libraries such as LAPACK commonly implement operations as blocked algorithms. However, to achieve next-to-optimal performance with such algorithms, significant tuning is required. On the other hand, recursive algorithms are virtually tuning free,… (More)

Optimal use of computing resources requires extensive coding , tuning and benchmarking. To boost developer productivity in these time consuming tasks, we introduce the Experimental Linear Algebra Performance Studies framework (ELAPS), a multi-platform open source environment for fast yet powerful performance experimentation with dense linear algebra… (More)

In order to associate complex traits with genetic polymorphisms, genome-wide association studies process huge datasets involving tens of thousands of individuals genotyped for millions of polymorphisms. When handling these datasets, which exceed the main memory of contemporary computers, one faces two distinct challenges: 1) Millions of polymorphisms and… (More)