• Publications
  • Influence
XSEDE: Accelerating Scientific Discovery
TLDR
XSEDE's integrated, comprehensive suite of advanced digital services federates with other high-end facilities and with campus-based resources, serving as the foundation for a national e-science infrastructure ecosystem. Expand
Scalable GPU graph traversal
TLDR
This work presents a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity. Expand
The Legion vision of a worldwide virtual computer
TLDR
The notion of a worldwide computer, now taking shape through the Legion project, distributes computation like the World-Wide Web distributes multimedia, creating the illusion for users of a very, very powerful desktop computer. Expand
High Performance and Scalable Radix Sorting: a Case Study of Implementing Dynamic Parallelism for GPU Computing
TLDR
A family of very efficient parallel algorithms for radix sorting; and the authors' allocation-oriented algorithmic design strategies that match the strengths of GPU processor architecture to this genre of dynamic parallelism are presented. Expand
The Legion Resource Management System
TLDR
The Legion resource management system is flexible both in its support for system-level resource management but also in their adaptability for user-level scheduling policies. Expand
Revisiting sorting for GPGPU stream architectures
TLDR
This poster presents efficient strategies for sorting large sequences of fixed-length keys (and values) using GPGPU stream processors using a parallel scan stream primitive that has been generalized in two ways: with local interfaces for producer/consumer operations (visiting logic), and with interfaces for performing multiple related, concurrent prefix scans (multi-scan). Expand
Easy-to-use object-oriented parallel processing with Mentat
TLDR
The Mentat programming language, which is based on C++, is described and performance results from implementing the Mentat runtime system on a network of Sun 3 and 4 workstations, the Silicon Graphics Iris, the Intel iPSC/2, and theIntel iPSc/860 are presented. Expand
Legion: The Next Logical Step Toward a Nationwide Virtual Computer
TLDR
The coming of giga-bit networks makes possible the realization of a single nationwide virtual computer comprised of a variety of geographically distributed high-performance machines and workstations, and the approach to constructing and exploiting such “metasystems” is described. Expand
High-Performance and Scalable GPU Graph Traversal
TLDR
This work presents a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum computations that achieves an asymptotically optimal O(|V| + |E|) gd work complexity. Expand
Parallel Scan for Stream Architectures
TLDR
This work presents three implementations of parallel scan that are memory-bound, utilize 100% of achievable memory bandwidth, and only require the use of a constant amount of global device memory for the storage of intermediate results. Expand
...
1
2
3
4
5
...