Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

@article{Li2020EvaluatingMG,
  title={Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect},
  author={Ang Li and Shuaiwen Song and Jieyang Chen and Jiajia Li and X. Liu and Nathan R. Tallent and K. Barker},
  journal={IEEE Transactions on Parallel and Distributed Systems},
  year={2020},
  volume={31},
  pages={94-110}
}
  • Ang Li, Shuaiwen Song, +4 authors K. Barker
  • Published 2020
  • Computer Science
  • IEEE Transactions on Parallel and Distributed Systems
  • High performance multi-GPU computing becomes an inevitable trend due to the ever-increasing demand on computation capability in emerging domains such as deep learning, big data and planet-scale simulations. [...] Key Result These observations indicate that, for an application running in a multi-GPU node, choosing the right GPU combination can impose considerable impact on GPU communication efficiency, as well as the application's overall performance.Expand Abstract
    Scalable Deep Learning on Distributed Infrastructures
    • 11
    • PDF
    Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems
    • 6
    • PDF
    Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools.
    • 5
    • Highly Influenced
    Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs
    • 3
    • PDF
    Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects
    • 1
    • PDF
    Large-Scale Discrete Fourier Transform on TPUs
    • 2
    • PDF
    iFDK: a scalable framework for instant high-resolution image reconstruction
    • 1
    • Highly Influenced
    • PDF

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 62 REFERENCES
    Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
    • 1,238
    • PDF
    MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters
    • 130
    • PDF
    A Proof for the Queuing Formula: L = λW
    • 2,055
    GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation
    • 57
    • PDF
    Graph processing on GPUs: Where are the bottlenecks?
    • 64
    • PDF
    Ultra-Performance Pascal GPU and NVLink Interconnect
    • 75
    Adaptive and transparent cache bypassing for GPUs
    • 57
    • PDF
    Intel® QuickPath Interconnect Architectural Features Supporting Scalable System Architectures
    • 77
    • Highly Influential
    • PDF
    Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels
    • 41
    • PDF