Aravind Sukumaran-Rajam

  • Citations Per Year
Learn More
Thread Level Speculation (TLS) is a dynamic code parallelization technique proposed to keep the software in pace with the advances in hardware, in particular, to automatically parallelize programs to take advantage of the multicore processors. Being speculative, frameworks of this type unavoidably rely on verification systems that are similar to software(More)
We present a dynamic dependence analyzer whose goal is to compute dependences from instrumented execution samples of loop nests. The resulting information serves as a prediction of the execution behavior during the remaining iterations and can be used to select and apply a speculatively optimizing and parallelizing polyhedral transformation of the target(More)
Matrix factorization of an incomplete matrix is useful in applications such as recommender systems. Several iterative algorithms have been proposed for matrix factorization for recommender systems, including Cyclic Coordinate Descent (CCD). Recently a variant of CCD called CCD++ was developed as an attractive algorithm for parallel implementation on(More)
Runtime loop optimization and speculative execution are becoming more and more prominent to leverage performance in the current multi-core and many-core era. However, a wider and more efficient use of such techniques is mainly hampered by the prohibitive time overhead induced by centralized data race detection, dynamic code behavior modeling and code(More)
A few weeks ago, we were glad to announce the first release of Apollo, the Automatic speculative POLyhedral Loop Optimizer. Apollo applies polyhedral optimizations on-the-fly to loop nests, whose control flow and memory access patterns cannot be determined at compile-time. In contrast to existing tools, Apollo can handle any kind of loop nest, whose memory(More)
Sparse matrix-matrix multiplication (SpGEMM) is an important primitive for many data analytics algorithms, such as Markov clustering. Unlike the dense case, where performance of matrix-matrix multiplication is considerably higher than matrix-vector multiplication, the opposite is true for the sparse case on GPUs. A significant challenge is that the sparsity(More)
  • 1