• Corpus ID: 10666610

Sorting on a Cluster Attached to a Storage-Area Network

  title={Sorting on a Cluster Attached to a Storage-Area Network},
  author={Jim Wyllie},
In November 2004, the SAN Cluster Sort program (SCS) set new records for the Indy versions of the Minute and TeraByte Sorts. SCS ran on a cluster of 40 dual-processor Itanium2 nodes on the show floor at the Supercomputing 2004 conference (SC04), performing its data accesses to 240 SAN-attached 8+P RAID5 arrays managed by the IBM General Parallel File System. This hardware and software combination achieved peak data transfer rates of over 14GB/sec, while sorting a 125GB input file in 58.7… 

Figures from this paper

Brief announcement: TeraByte TokuSampleSort sorts 1TB in 197s
The tx2500 disk cluster at MIT Lincoln Labortory sorted a terabyte (1010 100-byte records) in 197s using an "Indy" sort, and in 297s using a "Daytona" sort. The sort employed a parallel sample sort,
TritonSort: A Balanced Large-Scale Sorting System
We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100 TB of input data spread across 832 disks in
TeraByte TokuSampleSort
Using the tx2500 disk cluster at MIT Lincoln Laboraties, I so rted a terabyte (10 10 100-byte records) in 197s using an “Indy” sort, and in 297s using a “Daytona” sort. I sorted 264GB in one minut e
TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System
This article describes the hardware and software architecture necessary to operate TritonSort, a highly efficient, scalable sorting system designed to process large datasets, and is able to sort data at approximately 80% of the disks’ aggregate sequential write speed.
A "Measure of Transaction Processing" 20 Years Later
  • J. Gray
  • Computer Science
    IEEE Data Eng. Bull.
  • 2005
It is shown that improvement has exceeded Moore’s law – largely due to hardware improvements, software improvements, massive parallelism, and changing from mainframe to commodity economics.


High-performance sorting on networks of workstations
We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW). We find that parallel sorting on a NOW is competitive to sorting on the large-scale
SPsort: How to Sort a Terabyte Quickly
In December 1998, a 488 node IBM RS/6000 SP sorted a terabyte of data (10 billion 100 byte records) in 17 minutes, 37 seconds. This is more than 2.5 times faster than the previous record for a
GPFS: A Shared-Disk File System for Large Computing Clusters
GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.
Alphasort: A cache-sensitive parallel external sort
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and argues that modern architectures require algorithm designers to re-examine their use of the memory hierarchy.
Datamation 2001: A Sorting Odyssey
The implementation of WIND-SORT, which broke the Datamation record by roughly a factor of two, sorting 1 million 100-byte records in 0.48 seconds, has been identified: developing a fast remote execution service, conguring the cluster properly, and avoiding the potential ill-effects of occasionally faulty hardware.
A super scalar sort algorithm for RISC processors
New sort algorithms which eliminate almost all the compares, provide functional parallelism which can be exploited by multiple execution units, significantly reduce the number of passes through keys, and improve data locality are developed.
A Minute with Nsort on a 32P NEC Windows Itanium2 Server
In March 2004, the Nsort program was able to sort 34 GB of data (340,000,000 100-byte records) in 58 seconds on a 32 processor Itanium® 2 NEC® Express5800/1320Xd running Microsoft® Windows® Server
Performance measurements of FastSort are presented on various Tandem Nonstop processors, with particular emphasis on the speedup obtained by using parallelism to sort large files.
A measure of transaction processing power
These benchmarks measure the performance of diverse transaction processing systems and a standard system cost measure is stated and used to define price/performance metrics.
MPI: The Complete Reference
MPI: The Complete Reference is an annotated manual for the latest 1.1 version of the standard that illuminates the more advanced and subtle features of MPI and covers such advanced issues in parallel computing and programming as true portability, deadlock, high-performance message passing, and libraries for distributed and parallel computing.