• Corpus ID: 14271987

FlashMatrix: Parallel, Scalable Data Analysis with Generalized Matrix Operations using Commodity SSDs

  title={FlashMatrix: Parallel, Scalable Data Analysis with Generalized Matrix Operations using Commodity SSDs},
  author={Da Zheng and Disa Mhembere and Joshua T. Vogelstein and Carey E. Priebe and Randal C. Burns},
FlashMatrix is a matrix-oriented programming framework for general data analysis with high-level functional programming interface. It scales matrix operations beyond memory capacity by utilizing solid-state drives (SSDs) in non-uniform memory architecture (NUMA). It provides a small number of generalized matrix operations (GenOps) and reimplements a large number of matrix operations in the R framework with GenOps. As such, it executes R code in parallel and out of core automatically… 
Geometric Dimensionality Reduction for Subsequent Classification
This work proves, and substantiates with synthetic and real data experiments, that LOL leads to a better representation of the data for subsequent classification than other linear approaches, while adding negligible computational cost.
Discovering and deciphering relationships across disparate data modalities
The approach, ‘Multiscale Graph Correlation’ (MGC), is a dependence test that juxtaposes disparate data science techniques, including k-nearest neighbors, kernel methods, and multiscale analysis and uniquely characterizes the latent geometry underlying the relationship, while maintaining computational efficiency.
Toward Community-Driven Big Open Brain Science: Open Big Data and Tools for Structure, Function, and Genetics.
Brain science can be further democratized by harnessing the power of community-driven tools, which both are built by and benefit from many different people with different backgrounds and expertise, and enables collaborations across previously siloed communities.
Linear Optimal Low Rank Projection for High-Dimensional Multi-class Data
This work describes an approach, "Linear Optimal Low-rank" projection (LOL), which extends PCA by incorporating the class labels in a fashion that is advantageous over existing supervised dimensionality reduction techniques, and proves that LOL leads to a better representation of the data for subsequent classification than other linear approaches, while adding negligible computational cost.
Discovering Relationships and their Structures Across Disparate Data Modalities
The key insight is that one can adaptively restrict the analysis to the "jointly local" observations---that is, one can estimate the scales with the most informative neighbors for determining the existence and geometry of a relationship.


FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs
This work demonstrates that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing commodity SSDs with minimal performance loss by implementing a graph-processing engine on top of a user-space SSD file system designed for high IOPS and extreme parallelism.
Semi-External Memory Sparse Matrix Multiplication on Billion-node Graphs in a Multicore Architecture
This work implements sparse matrix dense matrix multiplication in a semi-external memory (SEM) fashion, i.e., it keeps the sparse matrix on SSDs and dense matrices in memory and achieves performance comparable to the in-memory implementation on a large parallel machine and outperforms the implementations in Trilinos and Intel MKL.
Toward millions of file system IOPS on low-cost, commodity hardware
  • Da Zheng, R. Burns, A. Szalay
  • Computer Science
    2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
  • 2013
A storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs and redesigns page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines.
External memory algorithms
This tutorial surveys the state of the art in the design and analysis of external memory algorithms (also known as EM algorithms or out-of-core algorithms or I/O algorithms), and discusses a variety of problems and shows how to solve them efficiently in external memory.
An SSD-based eigensolver for spectral analysis on billion-node graphs
An SSD-based eigensolver framework called FlashEigen is developed, which extends Anasazi eIGensolvers to SSDs, to compute eigenvalues of a graph with hundreds of millions or even billions of vertices in a single machine.
On the role of burst buffers in leadership-class storage systems
It is shown that burst buffers can accelerate the application perceived throughput to the external storage system and can reduce the amount of external storage bandwidth required to meet a desired application perceived bottleneck goal.
A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures
This article shows how the current state of hardware and software allows the programmability problem to be addressed without sacrificing performance.
Dryad: distributed data-parallel programs from sequential building blocks
The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.
AUGEM: Automatically generate high performance Dense Linear Algebra kernels on x86 CPUs
A template-based optimization framework, AUGEM, is presented, which can automatically generate fully optimized assembly code for several dense linear algebra kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers.
Naiad: a timely dataflow system
It is shown that many powerful high-level programming models can be built on Naiad's low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining.