Share This Author
The Convergence of Sparsified Gradient Methods
- Dan Alistarh, T. Hoefler, M. Johansson, Sarit Khirirat, Nikola Konstantinov, Cédric Renggli
- Computer ScienceNeurIPS
- 27 September 2018
It is proved that, under analytic assumptions, sparsifying gradients by magnitude with local error correction provides convergence guarantees, for both convex and non-convex smooth objectives, for data-parallel SGD.
Generic topology mapping strategies for large-scale parallel architectures
This work demonstrates an efficient and fast new heuristic which is based on graph similarity and shows its utility with application communication patterns on real topologies, and demonstrates that the benefit of topology mapping grows with the network size.
Slim Fly: A Cost Effective Low-Diameter Network Topology
This work proposes deadlock-free routing schemes and physical layouts for large computing centres as well as a detailed cost and power model for Slim Fly, a high-performance cost-effective network topology that approaches the theoretically optimal network diameter.
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis.
The problem of parallelization in DNNs is described from a theoretical perspective, followed by approaches for its parallelization, and potential directions for parallelism in deep learning are extrapolated.
Neural Code Comprehension: A Learnable Representation of Code Semantics
A novel processing technique to learn code semantics, and apply it to a variety of program analysis tasks, and shows that even without fine-tuning, a single RNN architecture and fixed inst2vec embeddings outperform specialized approaches for performance prediction and algorithm classification from raw code.
DARE: High-Performance State Machine Replication on RDMA Networks
A new set of protocols based on Remote Direct Memory Access (RDMA) primitives, using a strongly consistent key-value store, are proposed that enable operators to fully utilize the new capabilities of the quickly growing number of RDMA-capable datacenter networks.
LogGOPSim: simulating large-scale applications in the LogGOPS model
We introduce LogGOPSim---a fast simulation framework for parallel algorithms at large-scale. LogGOPSim utilizes a slightly extended version of the well-known LogGPS model in combination with full MPI…
The PERCS High-Performance Interconnect
- L. B. Arimilli, Ravi Arimilli, R. Rajamony
- Computer Science18th IEEE Symposium on High Performance…
- 18 August 2010
The Blue Waters System, which is being constructed at NCSA, is an exemplar large-scale PERCS installation that is expected to deliver sustained Pet scale performance over a wide range of applications.
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
- T. Hoefler, Timo Schneider, A. Lumsdaine
- Computer ScienceACM/IEEE International Conference for High…
- 13 November 2010
An in-depth analysis of the impact of system noise on large-scale parallel application performance in realistic settings shows that not only collective operations but also point-to-point communications influence the application's sensitivity to noise.
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
- T. Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, Alexandra Peste
- Computer ScienceJ. Mach. Learn. Res.
- 31 January 2021
This work describes approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice, and defines a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks.