GraphR: Accelerating Graph Processing Using ReRAM

@article{Song2018GraphRAG,
  title={GraphR: Accelerating Graph Processing Using ReRAM},
  author={Linghao Song and Youwei Zhuo and Xuehai Qian and Hai Helen Li and Yiran Chen},
  journal={2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)},
  year={2018},
  pages={531-543}
}
  • Linghao Song, Youwei Zhuo, Yiran Chen
  • Published 21 August 2017
  • Computer Science
  • 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)
Graph processing recently received intensive interests in light of a wide range of needs to understand relationships. [] Key Result The experiment results show that GRAPHR achieves a 16.01× (up to 132.67×) speedup and a 33.82× energy saving on geometric mean compared to a CPU baseline system. Compared to GPU, GRAPHR achieves 1.69× to 2.19× speedup and consumes 4.77× to 8.91× less energy. GRAPHR gains a speedup of 1.16× to 4.12×, and is 3.67× to 10.96× more energy efficiency compared to PIM-based architecture…
A Novel ReRAM-Based Processing-in-Memory Architecture for Graph Traversal
TLDR
This work proposes a new ReRAM-based processing-in-memory architecture called RPBFS, in which graph data can be persistently stored and processed in place and shows a significant performance improvement compared with both the CPU-based and the GPU-based BFS implementations.
GraphSAR: a sparsity-aware processing-in-memory architecture for large-scale graph processing on ReRAMs
TLDR
GraphSAR is presented, a sparsity-aware processing-in-memory large-scale graph processing accelerator on ReRAMs that achieves 4.43x energy reduction and 1.85x speedup against previous graph processing architecture on Re RAMs.
NodeFetch: High Performance Graph Processing using Processing in Memory
TLDR
NestFetch, a new method to access nodes and their neighbors while processing a graph by adding a new command to HMC system is proposed, a way of dealing with large-scale graph processing, considering recent advances in the field.
GRAM: graph processing in a ReRAM-based computational memory
TLDR
The proposed solution, GRAM, can efficiently executes vertex-centric model, which is widely used in large-scale parallel graph processing programs, in the computational memory, and maximizes the computation parallelism while minimizing the number of data movements.
ReGra: Accelerating Graph Traversal Applications Using ReRAM With Lower Communication Cost
TLDR
A PIMgraph traversal accelerator using ReRAM with a lower communication cost named ReGra, which optimizes the graph organization and communication efficiency in graph traversal and achieves better performance and yields a speedup of up to $2.2\times .
A Survey on Graph Processing Accelerators: Challenges and Opportunities
TLDR
This paper reviews the relevant techniques in three core components toward a graph processing accelerator: preprocessing, parallel graph computation, and runtime scheduling and finds that there is not an absolute winner for all three aspects in graph acceleration.
Efficient On-Chip Communication for Parallel Graph-Analytics on Spatial Architectures
TLDR
A novel power-law aware Graph Partitioning and Data Mapping scheme to reduce the communication latency by minimizing the hop counts on a scalable network-on-chip and makes the execution 2 − 5× faster and 2.7 − 4× energy-efficient by reducing the data movement time in comparison to a baseline implementation.
LCCG: a locality-centric hardware accelerator for high throughput of concurrent graph processing
TLDR
This paper proposes LCCG, a Locality-Centric programmable accelerator that augments the many-core processor for achieving higher throughput of Concurrent Graph processing jobs and develops a novel topology-aware execution approach into the accelerator design to regularize the graph traversals for multiple jobs on-the-fly according to the graph topology.
An efficient graph accelerator with parallel data conflict management
TLDR
AccuGraph is architected, a novel graph-specific accelerator that can simultaneously process atomic vertex updates for massive parallelism while ensuring the correctness, and its implementation on Xilinx FPGA with a wide variety of typical graph algorithms shows that the accelerator achieves an average throughput by 2.36 GTEPS.
GraphiDe: A Graph Processing Accelerator leveraging In-DRAM-Computing
TLDR
The extensive circuit-architecture simulations over three social network data-sets indicate that GraphiDe achieves on average 3.1x energy-efficiency improvement and 4.2x speed-up over the recent DRAM based PIM platform.
...
...

References

SHOWING 1-10 OF 79 REFERENCES
Graphicionado: A high-performance and energy-efficient accelerator for graph analytics
TLDR
Graphicionado augments the vertex programming paradigm, allowing different graph analytics applications to be mapped to the same accelerator framework, while maintaining flexibility through a small set of reconfigurable blocks, for high-performance, energy-efficient processing of graph analytics workloads.
A scalable processing-in-memory accelerator for parallel graph processing
TLDR
This work argues that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve memory-capacity-proportional performance and designs a programmable PIM accelerator for large-scale graph processing called Tesseract.
Energy Efficient Architecture for Graph Analytics Accelerators
TLDR
This paper proposes a configurable architecture template that is specifically optimized for iterative vertex-centric graph applications with irregular access patterns and asymmetric convergence and addresses the limitations of the existing multi-core CPU and GPU architectures for these types of applications.
GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks
TLDR
GraphPIM is presented, a full-stack solution for graph computing that achieves higher performance using PIM functionality and an extension to PIM operations that can further bring performance benefits for more graph applications.
PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs
TLDR
It is argued that skewed distributions in natural graphs also necessitate differentiated processing on high-degree and low-degree vertices, and PowerLyra, a new distributed graph processing system that embraces the best of both worlds of existing graph-parallel systems is introduced.
Proxy-Guided Load Balancing of Graph Processing Workloads on Heterogeneous Clusters
TLDR
This paper proposes a profiling methodology leveraging synthetic graphs for capturing a node's computational capability and guiding graph partitioning in heterogeneous environments with minimal overheads and shows that by sampling the execution of applications on synthetic graphs following a power-law distribution, the computing capabilities of heterogeneous clusters can be captured accurately.
Gunrock: a high-performance graph processing library on the GPU
TLDR
"Gunrock," the high-level bulk-synchronous graph-processing system targeting the GPU, takes a new approach to abstracting GPU graph analytics: rather than designing an abstraction around computation, Gunrock implements a novel data-centric abstraction centered on operations on a vertex or edge frontier.
LightGraph: Lighten Communication in Distributed Graph-Parallel Processing
TLDR
A mechanism is proposed that identifies and eliminates the avoidable communication during synchronization in existing distributed graph structured computing abstractions and is implemented on PowerGraph and created LightGraph to reduce communication overhead in distributed graph-parallel computation systems.
Practical Near-Data Processing for In-Memory Analytics Frameworks
TLDR
This paper develops the hardware and software of an NDP architecture for in-memory analytics frameworks, including MapReduce, graphprocessing, and deep neural networks, and shows that it is critical to optimize software frameworks for spatial locality as it leads to 2.9x efficiency improvements for NDP.
X-Stream: edge-centric graph processing using streaming partitions
TLDR
X-Stream is novel in using an edge-centric rather than a vertex-centric implementation of this model, and streaming completely unordered edge lists rather than performing random access, and competes favorably with existing systems for graph processing.
...
...