• Publications
  • Influence
Generic topology mapping strategies for large-scale parallel architectures
TLDR
This work demonstrates an efficient and fast new heuristic which is based on graph similarity and shows its utility with application communication patterns on real topologies, and demonstrates that the benefit of topology mapping grows with the network size. Expand
Slim Fly: A Cost Effective Low-Diameter Network Topology
TLDR
This work proposes deadlock-free routing schemes and physical layouts for large computing centres as well as a detailed cost and power model for Slim Fly, a high-performance cost-effective network topology that approaches the theoretically optimal network diameter. Expand
The Convergence of Sparsified Gradient Methods
TLDR
It is proved that, under analytic assumptions, sparsifying gradients by magnitude with local error correction provides convergence guarantees, for both convex and non-convex smooth objectives, for data-parallel SGD. Expand
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis.
TLDR
The problem of parallelization in DNNs is described from a theoretical perspective, followed by approaches for its parallelization, and potential directions for parallelism in deep learning are extrapolated. Expand
DARE: High-Performance State Machine Replication on RDMA Networks
TLDR
A new set of protocols based on Remote Direct Memory Access (RDMA) primitives, using a strongly consistent key-value store, are proposed that enable operators to fully utilize the new capabilities of the quickly growing number of RDMA-capable datacenter networks. Expand
LogGOPSim: simulating large-scale applications in the LogGOPS model
We introduce LogGOPSim---a fast simulation framework for parallel algorithms at large-scale. LogGOPSim utilizes a slightly extended version of the well-known LogGPS model in combination with full MPIExpand
The PERCS High-Performance Interconnect
TLDR
The Blue Waters System, which is being constructed at NCSA, is an exemplar large-scale PERCS installation that is expected to deliver sustained Pet scale performance over a wide range of applications. Expand
Demystifying Parallel and Distributed Deep Learning
TLDR
The problem of parallelization in DNNs is described from a theoretical perspective, followed by approaches for its parallelization, and potential directions for parallelism in deep learning are extrapolated. Expand
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
TLDR
An in-depth analysis of the impact of system noise on large-scale parallel application performance in realistic settings shows that not only collective operations but also point-to-point communications influence the application's sensitivity to noise. Expand
Neural Code Comprehension: A Learnable Representation of Code Semantics
TLDR
A novel processing technique to learn code semantics, and apply it to a variety of program analysis tasks, and shows that even without fine-tuning, a single RNN architecture and fixed inst2vec embeddings outperform specialized approaches for performance prediction and algorithm classification from raw code. Expand
...
1
2
3
4
5
...