M3: Scaling Up Machine Learning via Memory Mapping

  title={M3: Scaling Up Machine Learning via Memory Mapping},
  author={Dezhi Fang and Duen Horng Chau},
  journal={arXiv: Learning},
To process data that do not fit in RAM, conventional wisdom would suggest using distributed approaches. However, recent research has demonstrated virtual memory's strong potential in scaling up graph mining algorithms on a single machine. We propose to use a similar approach for general machine learning. We contribute: (1) our latest finding that memory mapping is also a feasible technique for scaling up general machine learning algorithms like logistic regression and k-means, when data fits in… Expand


Scalability! But at what COST?
This work surveys measurements of data-parallel systems recently reported in SOSP and OSDI, and finds that many systems have either a surprisingly large COST, often hundreds of cores, or simply underperform one thread for all of their reported configurations. Expand
MMap: Fast billion-scale graph computation on a PC via memory mapping
A minimalist approach to graph computation that forgoes sophisticated data structures and memory management strategies, by leveraging the fundamental memory mapping (MMap) capability found on operating systems. Expand
MLPACK: a scalable C++ machine learning library
MLPACK is a state-of-the-art, scalable, multi-platform C++ machine learning library providing cutting-edge algorithms whose benchmarks exhibit far better performance than other leading machine learning libraries. Expand
An Analysis of Linux Scalability to Many Cores
There is no scalability reason to give up on traditional operating system organizations just yet, according to this analysis of seven system applications running on Linux on a 48- core computer. Expand