Sayantan Chakravorty

Learn More
Large machines with tens or even hundreds of thousands of processors are currently in use. Fault tolerance is an important issue for these and the even larger machines of the future. Checkpoint based methods, currently used on most machines, rollback all processors to previous checkpoints after a crash. This wastes a significant amount of computation as all(More)
Failures are likely to be more frequent in systems with thousands of processors. Therefore, schemes for dealing with faults become increasingly important. In this paper, we present a fault tolerance solution for parallel applications that proactively migrates execution from processors where failure is imminent. Our approach assumes that some failures are(More)
Unstructured meshes are used in many engineering applications with irregular domains, from elastic deformation problems to crack propagation to fluid flow. Because of their complexity and dynamic behavior , the development of scalable parallel software for these applications is challenging. The Charm++ Parallel Framework for Unstructured Meshes allows one(More)
High-performance systems with thousands of processors have been introduced in the recent past, and systems with hundreds of thousands of processors should become available in the near future. Since failures are likely to be frequent in such systems, schemes for dealing with faults are important. In this paper, we introduce a new fault tolerance solution for(More)
Finite element simulations of dynamic fracture problems usually require very fine discretizations in the vicinity of the propagating stress waves and advancing crack fronts, while coarser meshes can be used in the remainder of the domain. This need for a constantly evolving discretization poses several challenges, especially when the simulation is performed(More)
Programming paradigms are designed to express algorithms elegantly and efficiently. There are many parallel programming paradigms, each suited to a certain class of problems. Selecting the best parallel programming paradigm for a problem minimizes programming effort and maximizes performance. Given the increasing complexity of parallel applications , no one(More)
Traditional full-featured operating systems are known to have properties that limit the scalability of distributed memory parallel programs, the most common programming paradigm utilized in high end computing. Furthermore, as processor counts increase with the most capable systems, the necessary activity to manage the system becomes more of a burden. To(More)
The Finite Element Method framework allows the user to develop scalable parallel finite element applications easily. During initialization it reads in an input mesh and partitions it into a large number of chunks that are distributed among different processors. This partition process is sequential and memory intensive. Thus the partition algorithm is a(More)