#### Filter Results:

- Full text PDF available (49)

#### Publication Year

1970

2015

#### Publication Type

#### Co-author

#### Publication Venue

#### Key Phrases

Learn More

- John A. Gunnels, Fred G. Gustavson, Greg Henry, Robert A. van de Geijn
- ACM Trans. Math. Softw.
- 2001

Since the advent of high-performance distributed-memory parallel computing, the need for intelligible code has become ever greater. The development and maintenance of libraries for these architectures is simply too complex to be amenable to conventional approaches to implementation. Attempts to employ traditional methodology have led, in our opinion, to the… (More)

- Richard P. Brent, Fred G. Gustavson, David Y. Y. Yun
- J. Algorithms
- 1980

We present two new algorithms, ADT and MDT, for solving order-n Toeplitz systems of linear equations T z = b in time O(n log 2 n) and space O(n). The fastest algorithms previously known, such as Trench's algorithm, require time Ω(n 2) and require that all principal submatrices of T be nonsingular. Our algorithm ADT requires only that T be nonsingular. Both… (More)

- Erik Elmroth, Fred G. Gustavson, Isak Jonsson, Bo Kågström
- SIAM Review
- 2004

Matrix computations are both fundamental and ubiquitous in computational science and its vast application areas. Along with the development of more advanced computer systems with complex memory hierarchies, there is a continuing demand for new algorithms and library software that efficiently utilize and adapt to new architecture features. This article… (More)

- Sivan Toledo, Fred G. Gustavson
- IOPADS
- 1996

SOLAR is a portable high-performance library for out-of-core dense matrix computations. It combines portability with high performance by using existing high-performance in-core subroutine libraries and by using an optimized matrix input-output library. SOLAR works on parallel computers, workstations, and personal computers. It supports in-core computations… (More)

- Fred G. Gustavson, Lars Karlsson, Bo Kågström
- ACM Trans. Math. Softw.
- 2012

Techniques and algorithms for efficient in-place conversion to and from standard and blocked matrix storage formats are described. Such functionality is required by numerical libraries that use different data layouts internally. Parallel algorithms and a software package for in-place matrix storage format conversion based on in-place matrix transposition… (More)

- Erik Elmroth, Fred G. Gustavson
- IBM Journal of Research and Development
- 2000

Applying recursion to serial and parallel QR factorization leads to better performance We present new recursive serial and parallel algorithms for QR factorization of an m by n matrix. They improve performance. The recursion leads to an automatic variable blocking, and it also replaces a Level 2 part in a standard block algorithm with Level 3 operations.… (More)

- Fred G. Gustavson
- ACM Trans. Math. Softw.
- 1978

Let A and B be two sparse matrices whose orders are p by q and q by r. Their product CAB requires N nontrlvial multiplications where 0 <_ N <_ pqr. The operation count of our algorithm is usually proportional to N; however, its worse case is O(p, r, NA, N) where NA is the number of elements in A This algorithm can be used to assemble the sparse matrix… (More)

- Erik Elmroth, Fred G. Gustavson
- PARA
- 1998

We present a new recursive algorithm for the QR factoriza-tion of an m by n matrix A. The recursion leads to an automatic variable blocking that allow us to replace a level 2 part in a standard block algorithm by level 3 operations. However, there are some additional costs for performing the updates which prohibits the eecient use of the recursion for large… (More)

- Ramesh C. Agarwal, Susanne M. Balle, Fred G. Gustavson, Mahesh V. Joshi, Prasad V. Palkar
- IBM Journal of Research and Development
- 1995

A three-dimensional (3D) matrix multiplication algorithm for massively parallel processing systems is presented. The P processors are conngured as a \virtual" processing cube with dimensions p 1 , p 2 , and p 3 proportional to the matrices' dimensions|M, N, and K. Each processor performs a single local matrix multiplication of size M=p 1 N=p 2 K=p 3. Before… (More)

Cache-oblivious algorithms have been advanced as a way of circumventing some of the difficulties of optimizing applications to take advantage of the memory hierarchy of modern microprocessors. These algorithms are based on the divide-and-conquer paradigm -- each division step creates sub-problems of smaller size, and when the working set of a sub-problem… (More)