- Narayanan Sundaram, Nadathur Satish, +5 authors Pradeep Dubey
- PVLDB
- 2015

Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution toâ€¦ (More)

We propose BlackOut, an approximation algorithm to efficiently train massive recurrent neural network language models (RNNLMs) with million word vocabularies. BlackOut is motivated by using aâ€¦ (More)

- Michael J. Anderson, Grey Ballard, James Demmel, Kurt Keutzer
- 2011 IEEE International Parallel & Distributedâ€¦
- 2011

We describe an implementation of the Communication-Avoiding QR (CAQR) factorization that runs entirely on a single graphics processor (GPU). We show that the reduction in memory traffic provided byâ€¦ (More)

- Md. Mostofa Ali Patwary, Nadathur Satish, +7 authors Pradeep Dubey
- ISC
- 2015

- Alexandru Iosup, Tim Hegeman, +11 authors Peter A. Boncz
- PVLDB
- 2016

In this paper we introduce LDBC Graphalytics, a new industrial-grade benchmark for graph analysis platforms. It consists of six deterministic algorithms, standard datasets, synthetic datasetâ€¦ (More)

- Michael J. Anderson, David Sheffield, Kurt Keutzer
- 2012 IEEE 26th International Parallel andâ€¦
- 2012

We examine the problem of solving many thousands of small dense linear algebra factorizations simultaneously on Graphics Processing Units (GPUs). We are interested in problems ranging from severalâ€¦ (More)

- Md. Mostofa Ali Patwary, Surendra Byna, +7 authors Pradeep Dubey
- SC15: International Conference for Highâ€¦
- 2015

Modern cosmology and plasma physics codes are now capable of simulating trillions of particles on petascale systems. Each timestep output from such simulations is on the order of 10s of TBs.â€¦ (More)

- Forrest N. Iandola, David Sheffield, Michael J. Anderson, Phitchaya Mangpo Phothilimthana, Kurt Keutzer
- 2013 IEEE International Conference on Imageâ€¦
- 2013

2D image convolution is ubiquitous in image processing and computer vision problems such as feature extraction. Exploiting parallelism is a common strategy for accelerating convolution. Parallelâ€¦ (More)

- Benjamin Armstrong, Jesse Pentzer, +6 authors Dean B. Edwards
- OCEANS 2009
- 2009

An effort has been initiated to develop a portable system capable of measuring the magnetic signature of a surface ship. The system will employ a formation of multiple AUVs, each equipped with aâ€¦ (More)

- Michael J. Anderson, Narayanan Sundaram, Nadathur Satish, Md. Mostofa Ali Patwary, Theodore L. Willke, Pradeep Dubey
- 2016 IEEE International Parallel and Distributedâ€¦
- 2016

The duality between graphs and matrices means that many common graph analyses can be expressed with primitives such as generalized sparse matrix-vector multiplication (SpMSpV) and sparseâ€¦ (More)