Harsha Vardhan Simhadri

Learn More
Perfectly secure message transmission (PSMT), a problem formulated by Dolev, Dwork, Waarts and Yung, involves a sender S and a recipient R who are connected by n synchronous channels of which up to t may be corrupted by an active adversary. The goal is to transmit, with perfect security, a message from S to R. PSMT is achievable if and only if n > 2t. For(More)
This announcement describes the problem based benchmark suite (PBBS). PBBS is a set of benchmarks designed for comparing parallel algorithmic approaches, parallel programming language styles, and machine architectures across a broad set of problems. Each benchmark is defined concretely in terms of a problem specification and a set of input distributions. No(More)
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on parallel machines with private or shared caches. The approach is to design nested-parallel algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation order has low cache complexity(More)
For nested-parallel computations with low depth (span, critical path length) analyzing the work, depth, and <i>sequential</i> cache complexity suffices to attain reasonably strong bounds on the <i> parallel</i> runtime and cache complexity on machine models with either shared or private caches. These bounds, however, do not extend to general hierarchical(More)
The running time of nested parallel programs on shared-memory machines depends in significant part on how well the scheduler mapping the program to the machine is optimized for the organization of caches and processor cores on the machine. Recent work proposed &#8220;space-bounded schedulers&#8221; for scheduling such programs on the multilevel cache(More)
This paper formalizes and studies <i>combinable memory-block transactions (MBTs)</i>. The idea is to encode short programs that operate on a single cache/memory block and then to specify such a program with a memory request. The code is then executed at the cache or memory controller, atomically with respect to other accesses to that block by this or other(More)
This paper presents the design, analysis, and implementation of parallel and sequential I/O-efficient algorithms for set cover, tying together the line of work on parallel set cover and the line of work on efficient set cover algorithms for large, disk-resident instances. Our contributions are twofold: First, we design and analyze a parallel(More)
—Communication, i.e., moving data between levels of a memory hierarchy or between processors over a network, is much more expensive (in time or energy) than arithmetic. There has thus been a recent focus on designing algorithms that minimize communication and, when possible, attain lower bounds on the total number of reads and writes. However, most previous(More)
Cache-oblivious algorithms have the advantage of achieving good sequential cache complexity across all levels of a multi-level cache hierarchy, regardless of the specifics (cache size and cache line size) of each level. In this paper, we describe cache-oblivious sorting algorithms with optimal work, optimal cache complexity and polylogarithmic depth. Using(More)