Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

  title={Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect},
  author={Ang Li and Shuaiwen Song and Jieyang Chen and Jiajia Li and X. Liu and Nathan R. Tallent and K. Barker},
  journal={IEEE Transactions on Parallel and Distributed Systems},
  • Ang Li, Shuaiwen Song, +4 authors K. Barker
  • Published 2020
  • Computer Science
  • IEEE Transactions on Parallel and Distributed Systems
  • High performance multi-GPU computing becomes an inevitable trend due to the ever-increasing demand on computation capability in emerging domains such as deep learning, big data and planet-scale simulations. [...] Key Result These observations indicate that, for an application running in a multi-GPU node, choosing the right GPU combination can impose considerable impact on GPU communication efficiency, as well as the application's overall performance.Expand Abstract
    29 Citations
    Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects
    • 1
    • PDF
    Speeding up Collective Communications Through Inter-GPU Re-Routing
    Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs
    • 5
    • Highly Influenced
    • PDF
    Distributed Join Algorithms on Multi-CPU Clusters with GPUDirect RDMA
    ASTRA-SIM: Enabling SW/HW Co-Design Exploration for Distributed DL Training Platforms
    • 3
    • PDF
    An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators
    Intelligent Data Placement on Discrete GPU Nodes with Unified Memory


    Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
    • 19
    GPU-Centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM
    • 15
    GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation
    • 61
    • PDF
    GPUDirect Async: Exploring GPU synchronous communication techniques for InfiniBand clusters
    • 4
    MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters
    • 131
    • PDF
    Fine-Grained Synchronizations and Dataflow Programming on GPUs
    • 30
    • PDF
    Memory access patterns: the missing piece of the multi-GPU puzzle
    • 34
    • PDF
    Multi-GPU System Design with Memory Networks
    • 30
    • PDF