# Compressed Linear Algebra for Large-Scale Machine Learning

@article{Elgohary2016CompressedLA, title={Compressed Linear Algebra for Large-Scale Machine Learning}, author={Ahmed Elgohary and Matthias Boehm and Peter J. Haas and Frederick Reiss and Berthold Reinwald}, journal={Proc. VLDB Endow.}, year={2016}, volume={9}, pages={960-971} }

Large-scale machine learning (ML) algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory. General-purpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Hence, we initiate work on compressed…

## Figures and Tables from this paper

## 39 Citations

Compressed linear algebra for large-scale machine learning

- Computer ScienceThe VLDB Journal
- 2017

This work begins work on value-based compressed linear algebra (CLA), in which heterogeneous, lightweight database compression techniques are applied to matrices, and then linear algebra operations such as matrix-vector multiplication are executed directly on the compressed representation.

Scaling Machine Learning via Compressed Linear Algebra

- Computer ScienceSGMD
- 2017

Large-scale machine learning algorithms are often iterative, using repeated read-only data access and I/Obound matrix-vector multiplications to converge to an optimal model, so effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm are needed.

Compressed linear algebra for declarative large-scale machine learning

- Computer ScienceCommun. ACM
- 2019

This work introduces Compressed Linear Algebra (CLA) for lossless matrix compression, which encodes matrices with lightweight, value-based compression techniques and executes linear algebra operations directly on the compressed representations.

Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent

- Computer ScienceSIGMOD Conference
- 2019

This work proposes a new lossless compression scheme called tuple-oriented compression (TOC) that is inspired by an unlikely source, the string/ text compression scheme Lempel-Ziv-Welch, but tailored to mini-batch stochastic gradient descent in a way that preserves tuple boundaries within mini-batches.

FlashR: parallelize and scale R for machine learning using SSDs

- Computer SciencePPoPP 2018
- 2018

Despite the huge performance gap between SSDs and RAM, FlashR on SSDs closely tracks the performance of FlashR in memory for many algorithms and the R implementations in FlashR outperforms H2O and Spark MLlib by a factor of 3 -- 20.

Technical Perspective: Scaling Machine Learning via Compressed Linear Algebra

- Computer ScienceSGMD
- 2017

The paper cleverly adapts ideas first developed in relational database systems — column-oriented compression, sampling-based cost estimation, trading between compression speed and compression rate — to build an elegant implementation of compressed linear algebra operations.

Lightweight Data Compression Algorithms: An Experimental Survey (Experiments and Analyses)

- Computer ScienceEDBT
- 2017

This work conducted an exhaustive experimental survey by evaluating several state-of-the-art compression algorithms as well as cascades of basic techniques, finding that there is no single-best algorithm.

Beyond Straightforward Vectorization of Lightweight Data Compression Algorithms for Larger Vector Sizes

- Computer ScienceGrundlagen von Datenbanken
- 2018

A novel implementation concept for run-length encoding using conflict-detection operations which have been introduced in Intel’s AVX-512 SIMD extension is presented and different data layouts for vectorization and their impact on wider vector sizes are investigated.

BlockJoin: Efficient Matrix Partitioning Through Joins

- Computer ScienceProc. VLDB Endow.
- 2017

BlockJoin is presented, a distributed join algorithm which directly produces block-partitioned results and applies database techniques known from columnar processing, such as index-joins and late materialization, in the context of parallel dataflow engines.

Low Level Big Data Compression

- Computer ScienceKDIR
- 2018

This work proposes a mechanism for storing and processing categorical information by compression at the bit level, and proposes a compression and decompression by blocks, with which the process of compressed information resembles theprocess of the original information.

## References

SHOWING 1-10 OF 84 REFERENCES

An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication

- Computer ScienceIEEE Transactions on Parallel and Distributed Systems
- 2013

A compressed storage format, called Compressed Sparse eXtended (CSX), that is able to detect and encode simultaneously multiple commonly encountered substructures inside a sparse matrix, and considerably reduce the memory footprint of a sparse matrices, alleviating the pressure to the memory subsystem.

SLACID - sparse linear algebra in a column-oriented in-memory database system

- Computer ScienceSSDBM '14
- 2014

This paper presents and compares different approaches of storing sparse matrices in an in-memory column-oriented database system and shows that a system layout derived from the compressed sparse row representation integrates well with a columnar database design and is moreover amenable to a wide range of non-numerical use cases when dictionary encoding is used.

Optimizing sparse matrix-vector multiplication using index and value compression

- Computer ScienceCF '08
- 2008

This paper proposes two distinct compression methods targeting index and numerical values respectively and demonstrates that the index compression method can be applied successfully to a wide range of matrices and the value compression method is able to achieve impressive speedups in a more limited yet important class of sparse matrices that contain a small number of distinct values.

On optimizing machine learning workloads via kernel fusion

- Computer SciencePPoPP 2015
- 2015

An analytical model is presented that considers input data characteristics and available GPU resources to estimate near-optimal settings for kernel launch parameters and demonstrates the effectiveness of the fused kernel approach in improving end-to-end performance on an entire ML algorithm.

An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs

- Computer ScienceICS '14
- 2014

A new blocked row-column (BRC) storage format with a novel two-dimensional blocking mechanism that effectively addresses the challenges: it reduces thread divergence by reordering and grouping rows of the input matrix with nearly equal number of non-zero elements onto the same execution units (i.e., warps).

Implementing sparse matrix-vector multiplication on throughput-oriented processors

- Computer ScienceProceedings of the Conference on High Performance Computing Networking, Storage and Analysis
- 2009

This work explores SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes, including structured grid and unstructured mesh matrices.

Super-Scalar RAM-CPU Cache Compression

- Computer Science22nd International Conference on Data Engineering (ICDE'06)
- 2006

This work proposes three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs and compares these algorithms with compression techniques used in (commercial) database and information retrieval systems.

Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems

- Computer ScienceBTW
- 2017

An exhaustive evaluation indicates that Gilbert is able to process varying amounts of data exceeding the memory of a single computer on clusters of different sizes and simplifies the development process significantly due to its high-level programming abstraction.

Dictionary-based order-preserving string compression for main memory column stores

- Computer ScienceSIGMOD Conference
- 2009

This paper proposes new data structures that efficiently support an order-preserving dictionary compression for (variablelength) string attributes with a large domain size that is likely to change over time and introduces a novel indexing approach that provides efficient access paths to such a dictionary while compressing the index data.

Compressed Nonnegative Matrix Factorization Is Fast and Accurate

- Computer ScienceIEEE Transactions on Signal Processing
- 2016

This work proposes to use structured random compression, that is, random projections that exploit the data structure, for two NMF variants: classical and separable, and shows that the resulting compressed techniques are faster than their uncompressed variants, vastly reduce memory demands, and do not encompass any significant deterioration in performance.