Sesh Venugopal

Learn More
Clustering is a mapping of the nodes of a task graph onto labeled clusters. We present a unified framework for clustering of directed acyclic graphs (DAGs). Several clustering algorithms from the literature are compared using this framework. For coarse grain DAGs two interesting properties are presented. For every nonlinear clustering there exists a linear(More)
We propose a sparse Cholesky factorization scheme based on a static task and communication schedule generated by symbolic preprocessing of the inter-coiumn dependencies. This information is used to reduce the overheads of maintaining data structures and the communication costs incurred in message passing during the numerical factorization step. We introduce(More)
We present a block-based, automatic partitioning and scheduling methodology for sparse matm " z factor-ization on distributed memory systems. U8ing ezpem "-mental results, we analyze this technique for conwnu-nication and load imbalance overhead. To study the performance effects, we compare these overheads with those in a straightforward " wrap-mapped "(More)
This innovative new book encourages readers to utilize the “Outside-In” approach to learning the use, design and implementation of data structures. The author introduces every data structure by first narrating its properties and use in applications (the "outside" view). This provides a clear introduction to data structures with realistic context where(More)
In this paper, we describe efficient, scalable, and deadlock-free asynchronous communication strategies suitable for unstructured computations on iPSC/860. Using these deadlock-free strategies, which incur small overhead, we have optimized the communication in parallel sparse Cholesky factorization. We present experimental results to show that such(More)
The problem of cache thrashing occurs in shared memory multiprocessors with local processor caches and a write-invalidate cache coherency protocol. Cache conflict , one of the causes of thrashing, arises when several processors try to write into different addresses of the same cache line. It is shown that parallel execution of loop iterations on a machine(More)
  • 1