• Publications
  • Influence
The Rocket Chip Generator
Rocket Chip is an open-source Sysem-on-Chip design generator that emits synthesizable RTL. It leverages the Chisel hardware construction language to compose a library of sophisticated generators forExpand
The GAP Benchmark Suite
TLDR
A graph processing benchmark suite that specifies graph kernels, input graphs, and evaluation methodologies, but it also provides optimized baseline implementations that can be used as a workload representative of graph processing. Expand
Direction-optimizing Breadth-First Search
TLDR
A hybrid approach is proposed that is advantageous for low-diameter graphs, which combines a conventional top-down algorithm along with a novel bottom-up algorithm that can dramatically reduce the number of edges examined, which accelerates the search as a whole. Expand
Direction-optimizing breadth-first search
TLDR
A hybrid approach is proposed that is advantageous for low-diameter graphs, which combines a conventional top-down algorithm along with a novel bottom-up algorithm that can dramatically reduce the number of edges examined, which accelerates the search as a whole. Expand
Silicon-photonic clos networks for global on-chip communication
TLDR
Analytical modeling is used to show that a 64-tile photonic Clos network consumes significantly less optical power, thermal tuning power, and area compared to global photonic crossbars over a range of photonic device parameters. Expand
Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server
TLDR
There is substantial room for a different processor architecture to improve performance without requiring a new memory system in high-performance graph algorithm codebases using hardware performance counters on a conventional dual-socket server. Expand
Reducing Pagerank Communication via Propagation Blocking
TLDR
This work presents propagation blocking, an optimization to improve spatial locality, and demonstrates its application to PageRank, and shows how it could be generalized to SpMV (sparse matrix multiplying dense vector) or other graph programming models. Expand
Distributed Memory Breadth-First Search Revisited: Enabling Bottom-Up Search
TLDR
This work presents a scalable distributed-memory parallelization of this challenging BFS algorithm and achieves a performance rate of over 240 billion edges per second on 115 thousand cores of a Cray XE6, which makes it over 7× faster than a conventional top-down algorithm using the same set of optimizations and data distribution. Expand
Re-architecting DRAM memory systems with monolithically integrated silicon photonics
TLDR
This work redesigns the DRAM main memory system using a proposed monolithically integrated silicon photonics technology and shows that the photonically interconnected DRAM (PIDRAM) provides a promising solution to all of these issues. Expand
Searching for a Parent Instead of Fighting Over Children : A Fast Breadth-First Search Implementation for Graph 500
This report provides a summary of an efficient breadth-first search implementation that is advantageous for social networks. This implementation uses a hybrid approach, combining a conventionalExpand
...
1
2
3
4
...