Energy-efficient sorting using solid state disks

@article{Beckmann2010EnergyefficientSU,
  title={Energy-efficient sorting using solid state disks},
  author={Andreas Beckmann and Ulrich Meyer and Peter Sanders and John Victor Singler},
  journal={International Conference on Green Computing},
  year={2010},
  pages={191-202}
}
We take sorting of large data sets as case study for making data-intensive applications more energy-efficient. Using a low-power processor, solid state disks, and efficient algorithms, we beat the current records in the JouleSort benchmark for 10GB to 1 TB of data by factors of up to 5.1. Since we also use parallel processing, this usually comes without a performance penalty. 

Figures and Tables from this paper

Energy-efficient sorting using solid state disks
The search for energy-efficient building blocks for the data center
TLDR
This paper conducts a survey of several small clusters of machines in search of the most energy-efficient data center building block targeting data-intensive computing, and builds five-node homogeneous clusters of each type and runs Dryad, a distributed execution engine, with a collection of data- intensive workloads to measure the energy consumption per task.
Data Structures: Time, I/Os, Entropy, Joules!
TLDR
Data compression and indexing nowadays play a key role in the design of modern algorithms for applications that manage massive datasets and can abundantly surpass the best expected technology advancements and the help from (sophisticated) operating systems or heuristics.
FAWNSort : Energy-efficient Sorting of 10 GB
TLDR
This system consists of a machine with a low-power server processor and five flash drives, sorting the 10GB dataset in 21.2 seconds (±0.227s) seconds with an average power of 104.9W, providing 44884 sorted records per Joule.
Engineering Algorithms for Large Data Sets
TLDR
This paper outlines the general challenges of algorithm engineering and gives examples from my work like sorting, full text indexing, graph algorithms, and database engines.
Elastic Prefetching for High-Performance Storage Devices
Elastic Prefetching for High-Performance Heterogeneous Storage Devices The spectrum of storage devices has expanded dramatically in the last several years with the increasing popularity of NAND flash
Impact of Programming Languages on Energy Consumption for Sorting Algorithms
TLDR
In this study, the main goal is to find such a programming language which consumes least amount of energy and contributes to green computing.
Flashy prefetching for high-performance flash drives
TLDR
It is demonstrated that data prefetching, when effectively harnessing the high performance of SSDs, can provide significant performance benefits for a wide range of data-intensive applications.
Critical Evaluation of Existing External Sorting Methods in the Perspective of Modern Hardware
TLDR
In this work, original assumptions of the external sorting algorithms are critically evaluated in empirical manner and possible improvements are proposed.
(When) Do Multiple Passes Save Energy?
TLDR
A strategy is tried to execute a program with “multiple passes,” which reduces data accesses while retaining speed optimality, and was shown to be effective for stencil computations on CPUs.
...
1
2
3
...

References

SHOWING 1-10 OF 56 REFERENCES
Energy-efficient sorting using solid state disks
Building a parallel pipelined external memory algorithm library
TLDR
STXXL library provides a framework for external memory algorithms with an easy-to-use interface for large and fast hard disks, but the clock speed of processors cannot keep up with the increasing bandwidth of parallel disks.
Nsort: a Parallel Sorting Program for NUMA and SMP Machines
TLDR
Ordinal TM Nsort TM is a high-performance sort program for SGI IRIX, Sun Solaris and HP-UX servers that can use tens of processors and hundreds of disks to quickly sort and merge data.
Scalable distributed-memory external sorting
TLDR
An algorithm whose I/O requirement is close to a lower bound is outlined, in contrast to naive implementations of multiway merging and all other approaches known to us, the algorithm works with just two passes over the data even for the largest conceivable inputs.
Building Energy-Efficient Systems for Sequential I/O Workloads
TLDR
This paper designs an SSD-based system for highly energy-efficient sequential I/O, and demonstrates that by trading latency for power, this system can achieve similar energy efficiency across a variety of systems, from embedded-class to desktop and server-class systems.
Asynchronous parallel disk sorting
We develop an algorithm for parallel disk sorting, whose I/O cost approaches the lower bound and that guarantees almost perfect overlap between I/O and computation. Previous algorithms have either
Energy-efficient cluster computing with FAWN: workloads and implications
TLDR
The architecture and motivation for a cluster-based, many-core computing architecture for energy-efficient, data-intensive computing, and the longer-term implications of FAWN lead us to select a tightly integrated stacked chip-and-memory architecture for future FAWN development are presented.
DEMSort — Distributed External Memory Sort
We present the results of our DEMSort program in various categories of the SortBenchmark. DEMSort is a sophisticated and highly tuned implementation of a mergesort-based algorithm. It makes use of
On Computational Models for Flash Memory Devices
TLDR
A broad range of existing external-memory algorithms and data structures based on the merging paradigm can be adapted efficiently into the unit-cost model, and the theoretical analysis of algorithms on these models corresponds to the empirical behavior of algorithms when using solid-state disks as external memory.
FAWNSort : Energy-efficient Sorting of 10 GB
TLDR
This system consists of a machine with a low-power server processor and five flash drives, sorting the 10GB dataset in 21.2 seconds (±0.227s) seconds with an average power of 104.9W, providing 44884 sorted records per Joule.
...
1
2
3
4
5
...