- Matteo Frigo, Steven G. Johnson
- Proceedings of the IEEE
- 2005

FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation thatâ€¦ (More)

- Matteo Frigo, Charles E. Leiserson, Keith H. Randall
- PLDI
- 1998

The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and theâ€¦ (More)

- Matteo Frigo, Steven G. Johnson
- ICASSP
- 1998

FFT literature has been mostly concerned with minimizing the number of floating-point operations performed by an algorithm. Unfortunately, on present-day microprocessors this measure is far lessâ€¦ (More)

- Matteo Frigo
- PLDI
- 1999

The FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machinesâ€¦ (More)

This paper introduces a storage format for sparse matrices, called <b><i>compressed sparse blocks (CSB)</i></b>, which allows both <i>Ax</i> and <i>A</i>,<i>x</i> to be computed efficiently inâ€¦ (More)

- Steven G. Johnson, Matteo Frigo
- IEEE Transactions on Signal Processing
- 2007

Recent results by Van Buskirk have broken the record set by Yavne in 1968 for the lowest exact count of real additions and multiplications to compute a power-of-two discrete Fourier transform (DFT).â€¦ (More)

- Matteo Frigo, Volker Strumpen
- ICS
- 2005

We present a cache oblivious algorithm for stencil computations, which arise for example in finite-difference methods. Our algorithm applies to arbitrary stencils in <i>n</i>-dimensional spaces. Onâ€¦ (More)

This paper introduces hyperobjects, a linguistic mechanism that allows different branches of a multithreaded program to maintain coordinated local views of the same nonlocal variable. We haveâ€¦ (More)

- Matteo Frigo, Volker Strumpen
- Theory of Computing Systems
- 2006

We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. Weâ€¦ (More)

This paper describes FFTW, a portable C package for computing the oneand multidimensional complex discrete Fourier transform (DFT). FFTW is typically faster than all other publicly available DFTâ€¦ (More)