STXXL: standard template library for XXL data sets

@article{Dementiev2008STXXLST,
  title={STXXL: standard template library for XXL data sets},
  author={Roman Dementiev and Lutz Kettner and Peter Sanders},
  journal={Software: Practice and Experience},
  year={2008},
  volume={38}
}
We present the software library STXXL that is an implementation of the C++ standard template library (STL) for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/O‐efficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in academic and industrial environments for a range of problems including text processing… 
Using TPIE for processing massive data sets in C++
TLDR
The adaptation of I/O-efficient algorithms in commercial and research applications can be facilitated by well-designed software libraries, and the MapReduce and Hadoop frameworks are very popular for implementing algorithms on clusters with large numbers of computing nodes.
A Library to Support the Development of Applications that Process Huge Matrices in External Memory
TLDR
A new library, named TiledMatrix, to support the development of applications that process large matrices stored in external memory that provides an interface for external memory access that is similar to the traditional method to access a matrix.
External memory BWT and LCP computation for sequence collections with applications
TLDR
This work proposes a space-efficient algorithm to compute the BWT and LCP array for a collection of sequences in the external or semi-external memory setting and proves that this algorithm performs O(nmaxlcp) sequential I/Os, where n is the total length of the collection and maxlcp is the maximum LCP value.
Review of algorithms and data structures: the basic toolbox by Kurt Mehlhorn and Peter Sanders
A toolbox: in the hands of an accomplished craftsman, it has a carefully selected set of the most commonlyused implements appropriate to the most frequently-encountered tasks. In the hands of a hack,
External memory pipelining made easy with TPIE
TLDR
A major extension of the TPIE library is presented that includes a pipelining framework that allows for practically efficient streaming-based implementations of I/O-efficient algorithms while minimizing I/ O-overhead between streaming components.
Algorithms and Data Structures for External Memory
  • J. Vitter
  • Computer Science
    Found. Trends Theor. Comput. Sci.
  • 2006
TLDR
The state of the art in the design and analysis of algorithms and data structures for external memory (or EM for short), where the goal is to exploit locality and parallelism in order to reduce the I/O costs is surveyed.
Tackling latency using FG
TLDR
FG, short for Asynchronous Buffered Computation Design and E gineering Framework Generator, is a programming framework that helps to mitigate latency in out-of-core programs that run on distributed-memory clusters and how FG’s interact ion with these real-world programs is shown.
MCSTL: the multi-core standard template library
TLDR
This work presents performance measurements on several architectures and concludes that simple recompilation will provide partial parallelization of applications that make consistent use of the C++ Standard Template Library.
Effective Use of SSDs in Database Systems
TLDR
Novel methods are proposed to exploit the new capabilities of modern SSDs to improve the performance of database systems and an SSD-friendly external merge sort is proposed that has better performance than other common external sorting techniques.
The GNU libstdc++ parallel mode: software engineering considerations
TLDR
The C++ Standard Library implementation provided with the free GNU C++ compiler, libstdc++, provides a "parallel mode" that enables existing serial code to take advantage of many parallelized STL algorithms, an approach to making use of multi-core processors which are now or will soon will be ubiquitous.
...
...

References

SHOWING 1-10 OF 120 REFERENCES
: Standard Template Library for XXL Data Sets
TLDR
Stxxl is an implementation of the C++ standard template library STL for external memory computations that supports parallel disks, overlapping between I/O and computation, and pipelining technique that can save more than half of the I/Os.
Algorithm engineering for large data sets
TLDR
Students, researchers and software developers who want to learn how the interplay of hardware, software, and state-of-the-art algorithms helps to achieve high-performance processing of massive data are taught in this book.
STAPL: An Adaptive, Generic Parallel C++ Library
TLDR
This work presents results obtained using STAPL for a molecular dynamics code and a particle transport code, and presents functionality to allow the user to further optimize the code and achieve additional performance gains.
Implementing I/O-efficient Data Structures Using TPIE
TLDR
The design and implementation of the second phase of TPIE, a portable, extensible, flexible, and easy to use C++ programming environment for efficiently implementing I/O-algorithms and data structures, is described.
LEDA-SM: external memory algorithms and data structures in theory and practice
TLDR
The functionality of external memory, which is realized by disk drives is explained and the most important theoretical I/O models are introduced, including the C++ class library LEDA-SM.
I/O Efficient Scientific Computation Using TPIE
TLDR
This paper discusses algorithmic issues underlying the design and implementation of the relevant components of TPIE and presents performance results of programs written to solve a series of benchmark problems using the current TPIE prototype.
External memory algorithms and data structures: dealing with massive data
TLDR
The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.
PSTL-A C++ Persistent Standard Template Library
TLDR
The Persistent Standard Template Library (PSTL) is designed, which provides its own containers that are compatible with STL, but store their elements on disk, and enables the reuse of many of the algorithms provided by STL in combination with PSTL.
MCSTL: the multi-core standard template library
TLDR
This work presents performance measurements on several architectures and concludes that simple recompilation will provide partial parallelization of applications that make consistent use of the C++ Standard Template Library.
GPUTeraSort: high performance graphics co-processor sorting for large database management
TLDR
Overall, the results indicate that using a GPU as a co-processor can significantly improve the performance of sorting algorithms on large databases.
...
...