Performing Out-of Core FFTs on Parallel Disk Systems

@article{Cormen1998PerformingOC,
  title={Performing Out-of Core FFTs on Parallel Disk Systems},
  author={Thomas H. Cormen and David M. Nicol},
  journal={Parallel Comput.},
  year={1998},
  volume={24},
  pages={5-20}
}

Figures and Tables from this paper

Multiprocessor out-of-core FFTs with distributed memory and parallel disks (extended abstract)

TLDR
Performance results on a small workstation cluster indicate that except for unusual combinations of problem size and memory size, the methods that do not perform interprocessor communication during the mini-butter y computations require approximately 86% of the time of those that do.

Two Algorithms for Performing Multidimensional, Multiprocessor, Out-of-Core FFTs

We show two algorithms for computing multidimensional Fast Fourier Transforms (FFTs) on a multiprocessor system with distributed memory when problem sizes are so large that the data do not fit in the

Optimizing the Dimensional Method for Performing Multidimensional, Multiprocessor, Out-of-Core FFTs

We present an improved version of the Dimensional Method for computing multidimensional Fast Fourier Transforms (FFTs) on a multiprocessor system when the data consist of too many records to fit into

Parallel MATLAB for Extreme Virtual Memory

TLDR
The pMatlab XVM program was smaller, easier to implement and verify, and more efficient than its hand coded equivalent, and the flexibility of pMat lab XVM allows hardware designers to experiment with FFT parameters in software before designing hardware for a real-time, ultra-long FFT.

Determining an Out-of-Core FFT Decomposition Strategy for Parallel Disks by Dynamic Programming

TLDR
An out-of-core FFT algorithm based on the in- core FFT method developed by Swarztrauber is presented and it is shown how to use dynamic programming to determine optimal splits at each recursive stage.

Determining an Out-of-Core FFT Decomposition Strategy for Parallel Disks by Dynamic Programming

TLDR
An out-of-core FFT algorithm based on the in- core FFT method developed by Swarztrauber is presented and it is shown how to use dynamic programming to determine optimal splits at each recursive stage.

Algorithms and Data Structures for External Memory

  • J. Vitter
  • Computer Science
    Found. Trends Theor. Comput. Sci.
  • 2006
TLDR
The state of the art in the design and analysis of algorithms and data structures for external memory (or EM for short), where the goal is to exploit locality and parallelism in order to reduce the I/O costs is surveyed.

Reducing I/O complexity by simulating coarse grained parallel algorithms

TLDR
A deterministic simulation technique which transforms coarse grained multicomputer (CGM) algorithms into external memory algorithms for the parallel disk model is presented, which optimizes block-wise data access and parallel disk I/O and, at the same time, utilizes multiple processors connected via a communication network or shared memory.

Fast Out-of-Core Sorting on Parallel Disk Systems

TLDR
The implementation of Rajasekaran''s (l,m)-mergesort algorithm (LMM) for sorting on parallel disks is discussed, which is asymptotically optimal for large problems and has the additional advantage of a low constant in its I/O complexity.

References

SHOWING 1-10 OF 33 REFERENCES

Multiprocessor out-of-core FFTs with distributed memory and parallel disks (extended abstract)

TLDR
Performance results on a small workstation cluster indicate that except for unusual combinations of problem size and memory size, the methods that do not perform interprocessor communication during the mini-butter y computations require approximately 86% of the time of those that do.

FFTs in external or hierarchical memory

  • D. Bailey
  • Computer Science
    Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89)
  • 1989
TLDR
Advanced techniques for computing an ordered FFT on a computer with external or hierarchical memory that require as few as two passes through the external data set, employ strictly unit stride, long vector transfers between main memory and external storage, and are well suited for vector and parallel computation are described.

Redundant disk arrays - reliable, parallel secondary storage

TLDR
This dissertation presents analytic models for disk-array lifetime, evaluates these against event-driven simulation, and applies them to an example redundant disk array, showing that a 10% overhead for an N + 1-parity encoding plus a 10%, overhead for on-line spares can provide higher reliability than the 100% overhead of conventional mirrored disks.

Asymptotically tight bounds for performing BMMC permutations on parallel disk systems

TLDR
One can determine efficiently at run time whether a permutation to be performed is BMMC and then avoid the general-permutation algorithm and save parallel I/Os by using the BMMC permutation algorithm herein.

Fast Fourier transform of externally stored data

TLDR
Two methods for FFT of one-dimensional arrays of data to be fast Fourier transformed are presented-one efficient when data storage is only slightly larger than available internal memory, and one when data is much larger.

A case for redundant arrays of inexpensive disks (RAID)

TLDR
Five levels of RAIDs are introduced, giving their relative cost/performance, and a comparison to an IBM 3380 and a Fujitsu Super Eagle is compared.

An algorithm for the machine calculation of complex Fourier series

TLDR
Good generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series, applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices.

Computational Frameworks for the Fast Fourier Transform

TLDR
The Radix-2 Frameworks, a collection of general and high performance FFTs designed to solve the multi-Dimensional FFT problem of Prime Factor and Convolution, are presented.

Evaluation Techniques for Storage Hierarchies

TLDR
A new and efficient method of determining, in one pass of an address trace, performance measures for a large class of demand-paged, multilevel storage systems utilizing a variety of mapping schemes and replacement algorithms.