# GraphR: Accelerating Graph Processing Using ReRAM

@article{Song2018GraphRAG,
title={GraphR: Accelerating Graph Processing Using ReRAM},
author={Linghao Song and Youwei Zhuo and Xuehai Qian and Hai Helen Li and Yiran Chen},
journal={2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)},
year={2018},
pages={531-543}
}
• Published 21 August 2017
• Computer Science
• 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)
Graph processing recently received intensive interests in light of a wide range of needs to understand relationships. [] Key Result The experiment results show that GRAPHR achieves a 16.01× (up to 132.67×) speedup and a 33.82× energy saving on geometric mean compared to a CPU baseline system. Compared to GPU, GRAPHR achieves 1.69× to 2.19× speedup and consumes 4.77× to 8.91× less energy. GRAPHR gains a speedup of 1.16× to 4.12×, and is 3.67× to 10.96× more energy efficiency compared to PIM-based architecture…
132 Citations

## Figures and Tables from this paper

A Novel ReRAM-Based Processing-in-Memory Architecture for Graph Traversal
• Computer Science
ACM Trans. Storage
• 2018
This work proposes a new ReRAM-based processing-in-memory architecture called RPBFS, in which graph data can be persistently stored and processed in place and shows a significant performance improvement compared with both the CPU-based and the GPU-based BFS implementations.
GraphSAR: a sparsity-aware processing-in-memory architecture for large-scale graph processing on ReRAMs
• Computer Science
ASP-DAC
• 2019
GraphSAR is presented, a sparsity-aware processing-in-memory large-scale graph processing accelerator on ReRAMs that achieves 4.43x energy reduction and 1.85x speedup against previous graph processing architecture on Re RAMs.
NodeFetch: High Performance Graph Processing using Processing in Memory
• Computer Science
• 2021
NestFetch, a new method to access nodes and their neighbors while processing a graph by adding a new command to HMC system is proposed, a way of dealing with large-scale graph processing, considering recent advances in the field.
GRAM: graph processing in a ReRAM-based computational memory
• Computer Science
ASP-DAC
• 2019
The proposed solution, GRAM, can efficiently executes vertex-centric model, which is widely used in large-scale parallel graph processing programs, in the computational memory, and maximizes the computation parallelism while minimizing the number of data movements.
ReGra: Accelerating Graph Traversal Applications Using ReRAM With Lower Communication Cost
• Computer Science
IEEE Access
• 2020
A PIMgraph traversal accelerator using ReRAM with a lower communication cost named ReGra, which optimizes the graph organization and communication efficiency in graph traversal and achieves better performance and yields a speedup of up to \$2.2\times .
A Survey on Graph Processing Accelerators: Challenges and Opportunities
• Computer Science
Journal of Computer Science and Technology
• 2019
This paper reviews the relevant techniques in three core components toward a graph processing accelerator: preprocessing, parallel graph computation, and runtime scheduling and finds that there is not an absolute winner for all three aspects in graph acceleration.
Efficient On-Chip Communication for Parallel Graph-Analytics on Spatial Architectures
A novel power-law aware Graph Partitioning and Data Mapping scheme to reduce the communication latency by minimizing the hop counts on a scalable network-on-chip and makes the execution 2 − 5× faster and 2.7 − 4× energy-efficient by reducing the data movement time in comparison to a baseline implementation.
LCCG: a locality-centric hardware accelerator for high throughput of concurrent graph processing
• Computer Science
SC
• 2021
This paper proposes LCCG, a Locality-Centric programmable accelerator that augments the many-core processor for achieving higher throughput of Concurrent Graph processing jobs and develops a novel topology-aware execution approach into the accelerator design to regularize the graph traversals for multiple jobs on-the-fly according to the graph topology.
An efficient graph accelerator with parallel data conflict management
AccuGraph is architected, a novel graph-specific accelerator that can simultaneously process atomic vertex updates for massive parallelism while ensuring the correctness, and its implementation on Xilinx FPGA with a wide variety of typical graph algorithms shows that the accelerator achieves an average throughput by 2.36 GTEPS.
GraphiDe: A Graph Processing Accelerator leveraging In-DRAM-Computing
• Computer Science
ACM Great Lakes Symposium on VLSI
• 2019
The extensive circuit-architecture simulations over three social network data-sets indicate that GraphiDe achieves on average 3.1x energy-efficiency improvement and 4.2x speed-up over the recent DRAM based PIM platform.

## References

SHOWING 1-10 OF 79 REFERENCES
Graphicionado: A high-performance and energy-efficient accelerator for graph analytics
• Computer Science
2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
• 2016
Graphicionado augments the vertex programming paradigm, allowing different graph analytics applications to be mapped to the same accelerator framework, while maintaining flexibility through a small set of reconfigurable blocks, for high-performance, energy-efficient processing of graph analytics workloads.
A scalable processing-in-memory accelerator for parallel graph processing
• Computer Science
2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)
• 2015
This work argues that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve memory-capacity-proportional performance and designs a programmable PIM accelerator for large-scale graph processing called Tesseract.
Energy Efficient Architecture for Graph Analytics Accelerators
• Computer Science
2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
• 2016
This paper proposes a configurable architecture template that is specifically optimized for iterative vertex-centric graph applications with irregular access patterns and asymmetric convergence and addresses the limitations of the existing multi-core CPU and GPU architectures for these types of applications.
• Computer Science
2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)
• 2017
GraphPIM is presented, a full-stack solution for graph computing that achieves higher performance using PIM functionality and an extension to PIM operations that can further bring performance benefits for more graph applications.
PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs
• Computer Science
TOPC
• 2019
It is argued that skewed distributions in natural graphs also necessitate differentiated processing on high-degree and low-degree vertices, and PowerLyra, a new distributed graph processing system that embraces the best of both worlds of existing graph-parallel systems is introduced.
• Computer Science
2016 45th International Conference on Parallel Processing (ICPP)
• 2016
This paper proposes a profiling methodology leveraging synthetic graphs for capturing a node's computational capability and guiding graph partitioning in heterogeneous environments with minimal overheads and shows that by sampling the execution of applications on synthetic graphs following a power-law distribution, the computing capabilities of heterogeneous clusters can be captured accurately.
Gunrock: a high-performance graph processing library on the GPU
• Computer Science
PPoPP
• 2015
"Gunrock," the high-level bulk-synchronous graph-processing system targeting the GPU, takes a new approach to abstracting GPU graph analytics: rather than designing an abstraction around computation, Gunrock implements a novel data-centric abstraction centered on operations on a vertex or edge frontier.
LightGraph: Lighten Communication in Distributed Graph-Parallel Processing
• Computer Science
2014 IEEE International Congress on Big Data
• 2014
A mechanism is proposed that identifies and eliminates the avoidable communication during synchronization in existing distributed graph structured computing abstractions and is implemented on PowerGraph and created LightGraph to reduce communication overhead in distributed graph-parallel computation systems.
Practical Near-Data Processing for In-Memory Analytics Frameworks
• Computer Science
2015 International Conference on Parallel Architecture and Compilation (PACT)
• 2015
This paper develops the hardware and software of an NDP architecture for in-memory analytics frameworks, including MapReduce, graphprocessing, and deep neural networks, and shows that it is critical to optimize software frameworks for spatial locality as it leads to 2.9x efficiency improvements for NDP.
X-Stream: edge-centric graph processing using streaming partitions
• Computer Science
SOSP
• 2013
X-Stream is novel in using an edge-centric rather than a vertex-centric implementation of this model, and streaming completely unordered edge lists rather than performing random access, and competes favorably with existing systems for graph processing.