Sorting on a Cluster Attached to a Storage-Area Network
@inproceedings{Wyllie2005SortingOA, title={Sorting on a Cluster Attached to a Storage-Area Network}, author={Jim Wyllie}, year={2005} }
In November 2004, the SAN Cluster Sort program (SCS) set new records for the Indy versions of the Minute and TeraByte Sorts. SCS ran on a cluster of 40 dual-processor Itanium2 nodes on the show floor at the Supercomputing 2004 conference (SC04), performing its data accesses to 240 SAN-attached 8+P RAID5 arrays managed by the IBM General Parallel File System. This hardware and software combination achieved peak data transfer rates of over 14GB/sec, while sorting a 125GB input file in 58.7…
5 Citations
Brief announcement: TeraByte TokuSampleSort sorts 1TB in 197s
- Computer ScienceSPAA '09
- 2009
The tx2500 disk cluster at MIT Lincoln Labortory sorted a terabyte (1010 100-byte records) in 197s using an "Indy" sort, and in 297s using a "Daytona" sort. The sort employed a parallel sample sort,…
TritonSort: A Balanced Large-Scale Sorting System
- Computer ScienceNSDI
- 2011
We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100 TB of input data spread across 832 disks in…
TeraByte TokuSampleSort
- Computer Science
- 2007
Using the tx2500 disk cluster at MIT Lincoln Laboraties, I so rted a terabyte (10 10 100-byte records) in 197s using an “Indy” sort, and in 297s using a “Daytona” sort. I sorted 264GB in one minut e…
TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System
- Computer ScienceTOCS
- 2013
This article describes the hardware and software architecture necessary to operate TritonSort, a highly efficient, scalable sorting system designed to process large datasets, and is able to sort data at approximately 80% of the disks’ aggregate sequential write speed.
A "Measure of Transaction Processing" 20 Years Later
- Computer ScienceIEEE Data Eng. Bull.
- 2005
It is shown that improvement has exceeded Moore’s law – largely due to hardware improvements, software improvements, massive parallelism, and changing from mainframe to commodity economics.
References
SHOWING 1-10 OF 16 REFERENCES
High-performance sorting on networks of workstations
- Computer ScienceSIGMOD '97
- 1997
We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW). We find that parallel sorting on a NOW is competitive to sorting on the large-scale…
SPsort: How to Sort a Terabyte Quickly
- Computer Science
- 1999
In December 1998, a 488 node IBM RS/6000 SP sorted a terabyte of data (10 billion 100 byte records) in 17 minutes, 37 seconds. This is more than 2.5 times faster than the previous record for a…
GPFS: A Shared-Disk File System for Large Computing Clusters
- Computer ScienceFAST
- 2002
GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.
Alphasort: A cache-sensitive parallel external sort
- Computer ScienceThe VLDB Journal
- 2005
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and argues that modern architectures require algorithm designers to re-examine their use of the memory hierarchy.
Datamation 2001: A Sorting Odyssey
- Computer Science
- 2001
The implementation of WIND-SORT, which broke the Datamation record by roughly a factor of two, sorting 1 million 100-byte records in 0.48 seconds, has been identified: developing a fast remote execution service, conguring the cluster properly, and avoiding the potential ill-effects of occasionally faulty hardware.
A super scalar sort algorithm for RISC processors
- Computer ScienceSIGMOD '96
- 1996
New sort algorithms which eliminate almost all the compares, provide functional parallelism which can be exploited by multiple execution units, significantly reduce the number of passes through keys, and improve data locality are developed.
A Minute with Nsort on a 32P NEC Windows Itanium2 Server
- Computer Science
- 2004
In March 2004, the Nsort program was able to sort 34 GB of data (340,000,000 100-byte records) in 58 seconds on a 32 processor Itanium® 2 NEC® Express5800/1320Xd running Microsoft® Windows® Server…
FASTSORT: AN EXTERNAL SORT USING PARALLEL PROCESSING
- Computer Science
- 2002
Performance measurements of FastSort are presented on various Tandem Nonstop processors, with particular emphasis on the speedup obtained by using parallelism to sort large files.
A measure of transaction processing power
- Computer Science, Economics
- 1985
These benchmarks measure the performance of diverse transaction processing systems and a standard system cost measure is stated and used to define price/performance metrics.
MPI: The Complete Reference
- Computer Science
- 1996
MPI: The Complete Reference is an annotated manual for the latest 1.1 version of the standard that illuminates the more advanced and subtle features of MPI and covers such advanced issues in parallel computing and programming as true portability, deadlock, high-performance message passing, and libraries for distributed and parallel computing.