• Corpus ID: 30149560

Parallelizing Big Data Machine Learning Applications with Model Rotation

  title={Parallelizing Big Data Machine Learning Applications with Model Rotation},
  author={Bingjing Zhang and Bo Peng and Judy Qiu},
This paper proposes model rotation as a general approach to parallelize big data machine learning applications. To solve the big model problem in parallelization, we distribute the model parameters to inter-node workers and rotate different model parts in a ring topology. The advantage of model rotation comes from maximizing the effect of parallel model updates for algorithm convergence while minimizing the overhead of communication. We formulate a solution using computation models, programming… 

Figures and Tables from this paper

HarpGBDT: Optimizing Gradient Boosting Decision Tree for Parallel Efficiency

A new tree growth method that selects the top K candidates of tree nodes to enable the use of more levels of parallelism without sacrificing the algorithm’s accuracy is adopted and HarpGBDT, a new GBDT system designed from the perspective of parallel efficiency optimization is proposed.

Learning Everywhere: Pervasive Machine Learning for Effective High-Performance Computation

  • G. FoxJ. Glazier S. Jha
  • Computer Science
    2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
  • 2019
The concept of "effective performance" that one can achieve by combining learning methodologies with simulation based approaches is introduced, and the interaction between traditional HPC and ML approaches is described.



STRADS: a distributed framework for scheduled model parallel machine learning

Schedule model parallelism (SchMP), a programming approach that improves ML algorithm convergence speed by efficiently scheduling parameter updates, taking into account parameter dependencies and uneven convergence, is proposed.

On Model Parallelization and Scheduling Strategies for Distributed Machine Learning

A system for model-parallelism, STRADS, that provides a programming abstraction for scheduling parameter updates by discovering and leveraging changing structural properties of ML programs, which enables a flexible tradeoff between scheduling efficiency and fidelity to intrinsic dependencies within the models, and improves memory efficiency of distributed ML.

Model-centric computation abstractions in machine learning applications

This work sets up parallel machine learning as a combination of training data-centric and model parameter-centric processing and shows that an efficient parallel model update pipeline can achieve similar or higher model convergence speed compared with other work.

Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units

A novel data partitioning scheme is proposed that effectively reduces the memory cost of parallelizing two inference methods on GPUs for latent Dirichlet Allocation models, collapsed Gibbs sampling and collapsed variational Bayesian.

Distributed Matrix Completion

The DALS, ASGD, and DSGD++ algorithms are novel variants of the popular alternating least squares and stochastic gradient descent algorithms, they exploit thread-level parallelism, in-memory processing, and asynchronous communication.

Scaling Distributed Machine Learning with the Parameter Server

View on new challenges identified are shared, and some of the application scenarios such as micro-blog data analysis and data processing in building next generation search engines are covered.

Petuum: A New Platform for Distributed Machine Learning on Big Data

This work proposes a general-purpose framework, Petuum, that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions.

Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking, and presents an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other's work.

A fast parallel SGD for matrix factorization in shared memory systems

A fast parallel SGD method, FPSGD, for shared memory systems is developed by dramatically reducing the cache-miss rate and carefully addressing the load balance of threads, which is more efficient than state-of-the-art parallel algorithms for matrix factorization.

Large-scale matrix factorization with distributed stochastic gradient descent

A novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of nonzero elements, called DSGD, that can be fully distributed and run on web-scale datasets using, e.g., MapReduce.