• Corpus ID: 253255506

Analysis and Optimization of GNN-Based Recommender Systems on Persistent Memory

  title={Analysis and Optimization of GNN-Based Recommender Systems on Persistent Memory},
  author={Yuwei Hu and Jiajie Li and Zhongming Yu and Zhiru Zhang},
Graph neural networks (GNNs), which have emerged as an effective method for handling machine learning tasks on graphs, bring a new approach to building recommender systems, where the task of recommendation can be formulated as the link prediction problem on user-item bipartite graphs. Training GNN-based recommender systems (GNNRecSys) on large graphs incurs a large memory footprint, easily exceeding the DRAM capacity on a typical server. Existing solutions resort to distributed subgraph… 

DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks

DistGNN is presented, which optimizes the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters via an efficient shared memory implementation, communication reduction using a minimum vertex-cut graph partitioning algorithm and communication avoidance using a family of delayed-update algorithms.

GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs

GNNAdvisor is proposed, an adaptive and efficient runtime system to accelerate various GNN workloads on GPU platforms and incorporates a lightweight analytical model for an effective design parameter search.

FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems

  • Yuwei HuZihao Ye Yida Wang
  • Computer Science
    SC20: International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2020
FeatGraph incorporates optimizations for graph traversal into the sparse templates and allows users to specify optimizations for UDFs with a feature dimension schedule (FDS) and FeatGraph speeds up end-to-end GNN training and inference by up to 32$ \times on CPU and 7$\times on GPU.

Graph Convolutional Neural Networks for Web-Scale Recommender Systems

A novel method based on highly efficient random walks to structure the convolutions and a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model are developed.

P3: Distributed Deep Graph Learning at Scale

This paper presents P3, a system that focuses on scaling GNN model training to large real-world graphs in a distributed setting and proposes a new approach for distributed GNN training that effectively eliminates high communication and partitioning overheads, and couples it with a new pipelined push-pull parallelism based execution strategy for fast model training.

DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs

  • Da ZhengChao Ma G. Karypis
  • Computer Science
    2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3)
  • 2020
The results show that DistDGL achieves linear speedup without compromising model accuracy and requires only 13 seconds to complete a training epoch for a graph with 100 million nodes and 3 billion edges on a cluster with 16 machines.

AliGraph: A Comprehensive Graph Neural Network Platform

This paper presents a comprehensive graph neural network system, namely AliGraph, which consists of distributed graph storage, optimized sampling operators and runtime to efficiently support not only existing popular GNNs but also a series of in-house developed ones for different scenarios.

Graph Neural Networks in Recommender Systems: A Survey

This article provides a taxonomy of GNN-based recommendation models according to the types of information used and recommendation tasks and systematically analyze the challenges of applying GNN on different types of data.

Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads

Dorylus is a distributed system for training GNNs that can take advantage of serverless computing to increase scalability at a low cost and is up to 3.8x faster and 10.7x cheaper compared to existing sampling-based systems.

Software-hardware co-design for fast and scalable training of deep learning recommendation models

This paper presents Neo, a software-hardware co-designed system for high-performance distributed training of large-scale DLRMs that employs a novel 4D parallelism strategy that combines table-wise, row- Wise, column- wise, and data parallelism for training massive embedding operators inDLRMs.