FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters

@article{Iandola2016FireCaffeNA,
  title={FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters},
  author={Forrest N. Iandola and Matthew W. Moskewicz and K. Ashraf and K. Keutzer},
  journal={2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2016},
  pages={2592-2600}
}
  • Forrest N. Iandola, Matthew W. Moskewicz, +1 author K. Keutzer
  • Published 2016
  • Computer Science
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • Long training times for high-accuracy deep neural networks (DNNs) impede research into new DNN architectures and slow the development of high-accuracy DNNs. In this paper we present FireCaffe, which successfully scales deep neural network training across a cluster of GPUs. We also present a number of best practices to aid in comparing advancements in methods for scaling and accelerating the training of deep neural networks. The speed and scalability of distributed algorithms is almost always… CONTINUE READING
    Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs
    7
    Towards Scalable Parallel Training of Deep Neural Networks
    6
    Communication Quantization for Data-Parallel Training of Deep Neural Networks
    68
    SCALEDEEP: A scalable compute architecture for learning and evaluating deep networks
    93
    Deep Learning on Large-Scale Muticore Clusters
    2
    swCaffe: A Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
    11
    Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs
    48

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 73 REFERENCES
    Speeding up Convolutional Neural Networks with Low Rank Expansions
    827
    Scalable distributed DNN training using commodity GPU cloud computing
    218
    Large Scale Distributed Deep Networks
    2337
    Project Adam: Building an Efficient and Scalable Deep Learning Training System
    503
    Mariana: Tencent Deep Learning Platform and its Applications
    38
    Fast Convolutional Nets With fbfft: A GPU Performance Evaluation
    236
    cuDNN: Efficient Primitives for Deep Learning
    1047