Corpus ID: 5208262

RABIT : A Reliable Allreduce and Broadcast Interface

  title={RABIT : A Reliable Allreduce and Broadcast Interface},
  author={T. Chen and I. Cano and Tianyi Zhou},
  • T. Chen, I. Cano, Tianyi Zhou
  • Published 2015
  • Allreduce is an abstraction commonly used for solving machine learning problems. It is an operation where every node starts with a local value and ends up with an aggregate global result. MPI provides an Allreduce implementation. Though it has been widely adopted, it is somewhat limited; it lacks fault tolerance and cannot run easily on existing systems. In this work, we propose RABIT1, an Allreduce library suitable for distributed machine learning algorithms that overcomes the aforementioned… CONTINUE READING

    Figures from this paper.

    Block-distributed Gradient Boosted Trees
    A Survey on Large-scale Machine Learning
    XDL: an industrial deep learning framework for high-dimensional sparse data
    • 1
    • Open Access


    Publications referenced by this paper.
    Stochastic gradient boosted distributed decision trees
    • 194
    • Open Access
    A reliable effective terascale linear learning system
    • 327
    • Open Access