#### Filter Results:

- Full text PDF available (7)

#### Publication Year

1995

2004

- This year (0)
- Last 5 years (0)
- Last 10 years (0)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

- Allan Snavely, Larry Carter, +5 authors Brian Koblenz
- SC
- 1998

The Tera MTA is a revolutionary commercial computer based on a multithreaded processor architecture. In contrast to many other parallel architectures, the Tera MTA can effectively use high amounts of parallelism on a single processor. By running multiple threads on a single processor, it can tolerate memory latency and to keep the processor saturated. If… (More)

- Kang Su Gatlin, Larry Carter
- HPCA
- 1999

This paper explores the interplay between algorithm design and a computer’s memory hierarchy. Matrix transpose and the bit-reversal reordering are important scientific subroutines which often exhibit severe performance degradation due to cache and TLB associativity problems. We give lower bounds that show for typical memory hierarchy designs, extra data… (More)

- Larry Carter, Kang Su Gatlin
- FOCS
- 1998

The speed of many computations is limited not by the number of arithmetic operations but by the time it takes to move and rearrange data in the increasingly complicated memory hierarchies of modern computers. Array transpose and the bit-reversal permutation – trivial operations on a RAM – present non-trivial problems when designing highly-tuned scientific… (More)

- Bowen Alpern, Larry Carter, Kang Su Gatlin
- SC
- 1995

The Smith-Waterman algorithm is a computationally-intensive string-matching operation that is fundamental to the analysis of proteins and genes. In this paper, we explore the use of some standard and novel techniques for improving its performance. We begin by tuning the algorithm using conventional techniques. These make modest performance improvements by… (More)

- Kang Su Gatlin, Larry Carter
- SC
- 1999

Divide and conquer programs can achieve good performance on parallel computers and computers with deep memory hierarchies. We introduce architecture-cognizant divide and conquer algorithms, and explore how they can achieve even better performance. An architecture-cognizant algorithm has functionallyequivalent variants of the divide and/or combine functions,… (More)

- Kang Su Gatlin
- ACM Queue
- 2004

We now sit firmly in the 21st century where the grand challenge to the modern-day programmer is neither memory leaks nor type issues (both of those problems are now effectively solved), but rather issues of concurrency. How does one write increasingly complex programs where concurrency is a first-class concern. Or even more treacherous, how does one debug… (More)

- Kang Su Gatlin, Larry Carter
- IEEE PACT
- 2000

The Fast Fourier Transform (FFT) is one of the most important algorithms in computational science, accounting for large amounts of computing time. One major problem with modern FFT implementations is that they poorly scale to large problem. As the problem size increases, stride and associativity effects play a larger role. The result is a severe drop-off in… (More)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

- Vladimir Getov, Yuan Wei, Larry Carter, Kang Su Gatlin
- PDP
- 1999

The fast Fourier transform (FFT) is the cornerstone of many supercomputer applications and therefore needs careful performance tuning. Most often, however, the real performance of the FFT implementations is far below the acceptable figures. In this paper, we explore several strategies for performance optimisations of the FFT computation, such as enhancing… (More)

HPF is a data parallel Fortran dialect currently implemented on diverse hardware platforms ranging from workstations to massively parallel processors. To date, performance data are sparse. We will present preliminary measurements of selected benchmarks, comparing HPF applications against equivalent SPMD implementations and the same HPF implementation… (More)

- ‹
- 1
- ›