Compressed Linear Algebra for Large-Scale Machine Learning
@article{Elgohary2016CompressedLA, title={Compressed Linear Algebra for Large-Scale Machine Learning}, author={Ahmed Elgohary and Matthias Boehm and Peter J. Haas and Frederick Reiss and Berthold Reinwald}, journal={Proc. VLDB Endow.}, year={2016}, volume={9}, pages={960-971} }
Large-scale machine learning (ML) algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory. General-purpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Hence, we initiate work on compressed…
Figures and Tables from this paper
39 Citations
Compressed linear algebra for large-scale machine learning
- Computer ScienceThe VLDB Journal
- 2017
This work begins work on value-based compressed linear algebra (CLA), in which heterogeneous, lightweight database compression techniques are applied to matrices, and then linear algebra operations such as matrix-vector multiplication are executed directly on the compressed representation.
Scaling Machine Learning via Compressed Linear Algebra
- Computer ScienceSGMD
- 2017
Large-scale machine learning algorithms are often iterative, using repeated read-only data access and I/Obound matrix-vector multiplications to converge to an optimal model, so effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm are needed.
Compressed linear algebra for declarative large-scale machine learning
- Computer ScienceCommun. ACM
- 2019
This work introduces Compressed Linear Algebra (CLA) for lossless matrix compression, which encodes matrices with lightweight, value-based compression techniques and executes linear algebra operations directly on the compressed representations.
Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent
- Computer ScienceSIGMOD Conference
- 2019
This work proposes a new lossless compression scheme called tuple-oriented compression (TOC) that is inspired by an unlikely source, the string/ text compression scheme Lempel-Ziv-Welch, but tailored to mini-batch stochastic gradient descent in a way that preserves tuple boundaries within mini-batches.
FlashR: parallelize and scale R for machine learning using SSDs
- Computer SciencePPoPP 2018
- 2018
Despite the huge performance gap between SSDs and RAM, FlashR on SSDs closely tracks the performance of FlashR in memory for many algorithms and the R implementations in FlashR outperforms H2O and Spark MLlib by a factor of 3 -- 20.
Technical Perspective: Scaling Machine Learning via Compressed Linear Algebra
- Computer ScienceSGMD
- 2017
The paper cleverly adapts ideas first developed in relational database systems — column-oriented compression, sampling-based cost estimation, trading between compression speed and compression rate — to build an elegant implementation of compressed linear algebra operations.
Lightweight Data Compression Algorithms: An Experimental Survey (Experiments and Analyses)
- Computer ScienceEDBT
- 2017
This work conducted an exhaustive experimental survey by evaluating several state-of-the-art compression algorithms as well as cascades of basic techniques, finding that there is no single-best algorithm.
Beyond Straightforward Vectorization of Lightweight Data Compression Algorithms for Larger Vector Sizes
- Computer ScienceGrundlagen von Datenbanken
- 2018
A novel implementation concept for run-length encoding using conflict-detection operations which have been introduced in Intel’s AVX-512 SIMD extension is presented and different data layouts for vectorization and their impact on wider vector sizes are investigated.
BlockJoin: Efficient Matrix Partitioning Through Joins
- Computer ScienceProc. VLDB Endow.
- 2017
BlockJoin is presented, a distributed join algorithm which directly produces block-partitioned results and applies database techniques known from columnar processing, such as index-joins and late materialization, in the context of parallel dataflow engines.
Low Level Big Data Compression
- Computer ScienceKDIR
- 2018
This work proposes a mechanism for storing and processing categorical information by compression at the bit level, and proposes a compression and decompression by blocks, with which the process of compressed information resembles theprocess of the original information.
References
SHOWING 1-10 OF 84 REFERENCES
An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication
- Computer ScienceIEEE Transactions on Parallel and Distributed Systems
- 2013
A compressed storage format, called Compressed Sparse eXtended (CSX), that is able to detect and encode simultaneously multiple commonly encountered substructures inside a sparse matrix, and considerably reduce the memory footprint of a sparse matrices, alleviating the pressure to the memory subsystem.
SLACID - sparse linear algebra in a column-oriented in-memory database system
- Computer ScienceSSDBM '14
- 2014
This paper presents and compares different approaches of storing sparse matrices in an in-memory column-oriented database system and shows that a system layout derived from the compressed sparse row representation integrates well with a columnar database design and is moreover amenable to a wide range of non-numerical use cases when dictionary encoding is used.
Optimizing sparse matrix-vector multiplication using index and value compression
- Computer ScienceCF '08
- 2008
This paper proposes two distinct compression methods targeting index and numerical values respectively and demonstrates that the index compression method can be applied successfully to a wide range of matrices and the value compression method is able to achieve impressive speedups in a more limited yet important class of sparse matrices that contain a small number of distinct values.
On optimizing machine learning workloads via kernel fusion
- Computer SciencePPoPP 2015
- 2015
An analytical model is presented that considers input data characteristics and available GPU resources to estimate near-optimal settings for kernel launch parameters and demonstrates the effectiveness of the fused kernel approach in improving end-to-end performance on an entire ML algorithm.
An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs
- Computer ScienceICS '14
- 2014
A new blocked row-column (BRC) storage format with a novel two-dimensional blocking mechanism that effectively addresses the challenges: it reduces thread divergence by reordering and grouping rows of the input matrix with nearly equal number of non-zero elements onto the same execution units (i.e., warps).
Implementing sparse matrix-vector multiplication on throughput-oriented processors
- Computer ScienceProceedings of the Conference on High Performance Computing Networking, Storage and Analysis
- 2009
This work explores SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes, including structured grid and unstructured mesh matrices.
Super-Scalar RAM-CPU Cache Compression
- Computer Science22nd International Conference on Data Engineering (ICDE'06)
- 2006
This work proposes three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs and compares these algorithms with compression techniques used in (commercial) database and information retrieval systems.
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems
- Computer ScienceBTW
- 2017
An exhaustive evaluation indicates that Gilbert is able to process varying amounts of data exceeding the memory of a single computer on clusters of different sizes and simplifies the development process significantly due to its high-level programming abstraction.
Dictionary-based order-preserving string compression for main memory column stores
- Computer ScienceSIGMOD Conference
- 2009
This paper proposes new data structures that efficiently support an order-preserving dictionary compression for (variablelength) string attributes with a large domain size that is likely to change over time and introduces a novel indexing approach that provides efficient access paths to such a dictionary while compressing the index data.
Compressed Nonnegative Matrix Factorization Is Fast and Accurate
- Computer ScienceIEEE Transactions on Signal Processing
- 2016
This work proposes to use structured random compression, that is, random projections that exploit the data structure, for two NMF variants: classical and separable, and shows that the resulting compressed techniques are faster than their uncompressed variants, vastly reduce memory demands, and do not encompass any significant deterioration in performance.